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HAEMOPHILUS ADHESION PROTEINS 

The U.S. Govemmenthas certain rights in this invention pursuant to grant numbers 
AI-21707 and HD-29687 from National Institutes of Health. 

FIELD OF THE INVENTION 

5 The invention relates to novel Haemophilus adhesion proteins, nucleic acids, and 
antibodies. 

BACKGROUND OF THE INVENTION 

Most bacterial diseases begin with colonization of a particular mucosal surface 
(Beachey et al., 1981, J. Infect Dis. 143:325-345). Successful colonization requires 

1 0 that an organism overcome mechanical cleansing of the mucosal surface and evade 
the local immune response. The process of colonization is dependent upon 
specialized microbial factors that promote binding to host cells (Hultgren et al, 
1 993 Cell, 73:887-901 ). In some cases the colonizing organism will subsequently 
enter (invade) these cells and survive intracellularly (Falkow, 1991, Cell 65: 1099- 

15 1102). 

Haemophilus influenzae is a common commensal organism of the human respiratory 
tract (Kuklinska and Kilian, 1 984, Eur. J. Clin. Microbiol. 3:249-252). It is the most 
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common — of — — ■* - ' ~ " f 

^ctetaemiOdisease, .n^^^s.is^.eroras.^ 

fraction of acue and chronic otitis media, sinusitis, bronchitis, and pneumonra. 

nl.Jan^ofs^^c^.of^a,— 
H,« S,. Ceme « -1. I™. *~ «• ^ ° S A 90:2875 ; 28 ' ,) 

I nfe tt .n^.5 8 :4036^;Stamea n dFa lk o»M9O,,M e ct ta n-59:B. 5 . 
„33 .nfcctlmmunW^eWJTD. As a result, this bacterium is an important 
ciuseofMhiocai^d respitato.hac.a^s^micdiseasCTutMm.M^ 
Micmbio.. . 8; .-.6). Nonencapsu^^pabies^accoutnfor^mapn^ 

. capsuic composed o, a poiymer of ribose and ^ J 

CaLin^of«^u I ^.P^'»S.H.Se..a»dP.F.*n 8 h t 

(e d , ta#< immU "°" >6) '- "* "~ 

disease. Elsevier/Norm-Hotland Publishing Co, New York). 

The initia. step in the patho g enesis of disease due to H. ta*- 
S^^ncoiontouon^aparticuiats^nmaype^istforweeks^mona,, 

,o 86 I lnfect Dis. 15 4 :1 00->09,. Howcver.incertainci.ums.ncesco— 
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on occasion bacteria will penetrate the nasopharyngeal epithelial barrier and enter 
the bloodstream. 

In vitro observationsand animal studies suggest that bacterial surface appendages 
called pili (or fimbriae) play an important role in H. influenzae colonization. In 

5 1982 two groups reported a correlationbetween piliationand increased attachment 
to human oropharyngeal epithelial cells and erythrocytes (Guerina et aL supra; 
Pichichero et al., supra). Other investigators have demonstrated that anti-pilus 
antibodies block in vitro attachment by piliated H. influenzae (Forney et al. . 1 991 
J.Infect.Dis. 165:464-470;vanAlphengro/., 1988. Infect. Immun. 56: 1800- 1806) 

1 0 Recently Weber et al. insertionally inactivated the pilus structural gene in an H. 
influenzae type b strain and thereby eliminated expression of pili; the resulting 
mutant exhibited a reduced capacity for colonization of year-old monkeys (Weber 
et al. . 1 99 1 . Infect. Immun. 59:4724-4728). 

A number of reports suggest that nonpilus factors also facilitate Haemophilus 
1 5 colonization. Using the human nasopharyngeal organ culture model. Farley et al. 
(1986.J. Infect. Dis. 161:274-280)andLoebe/a/. (1988, Infect. Immun. 49:484- 
489) noted that nonpiliated type b strains were capable of mucosal attachment. Read 
and coworkers made similar observations upon examining nontypable strains in 
a model that employs nasal turbinate tissue in organ culture (1991. J. Infect. Dis. 
20 163:549-558). In the monkey colonization study by Weber et al. (1991. supra), 
nonpiliated organisms retained a capacity for colonization, though at reduced 
densities: moreover, among monkeys originally infected with the piliated strain, 
virtually all organisms recovered from the nasopharynx were nonpiliated. All of 
theseobservationsareconsistentwiththefindingthatnasopharyngealisolatesfrom 

25 children colonized with//, influenzae are frequently nonpiliated (Mason etal.. 1985. 
Infect. Immun. 49:98-1 03; Brintone/ aL 1989. Pediatr. Infect. Dis. J. 8:554-561) 
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^viousstudieshaveshownthatH influenzae are capable of entering (invading) 
cuhuredhuman epithelial cells via a pili-independent mechanism (St. Geme and 
Falkow.l990,su Pra ;St.GemeandFalkow,1991.supra). Although//, influenzae 
is not generally considered an intracellular parasite, a recent report suggests that 
5 these in vitro fmdingsmay have an in vivo correlate (Forsgrener aL 1994. supra). 
Forsgrenand coworker exarr^nedadenoids from 10 children who had their adeno.ds 

rem0 vedbecauseoflongsmdm^ 

Inall lOcasestherewere viable intracellular//, influenzae. Electron microscopy 
demonstrated that these organisms were concentrated in the reticular crypt 
l0 epitheliumandinmacrc^ ^ 
possibility is thatbacterialentry into host cells provides a mechanism for evas,on 

ofthelocalirrmtunerespo^^ 

Thus, a vaccine for the therapeutic and prophylactic treatment of Haemophilus 
infection is desirable. Accordingly, it is an object of the present invention to provide 
,5 forrecombirumt/faem*^ 

to produce useful quantities of these HA proteins using recombinant DNA 

techniques. 

It is a furtherobjectof the inventions provide recombinantnucleic acidsencoding 
HA proteins, and expression vectors and host cells containing the nucleic acid 
20 encoding the HA protein. 

An additionalobject of the invention is to provide monoclonal antibodies for the 
diagnosis of Haemophilus infection. 

A further object of the invention is to provide methods for producing the HA 
proteins, and a vaccine comprising the HA proteins of the present inventus 
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Methods for the therapeutic and prophylactic treatment of Haemophilus infection 
are also provided. 

SUMMARY OF THE INVENTION 

In accordance with the foregoing objects, the present invention provides recombinant 
5 HA proteins, and isolated or recombinant nucleic acids which encode the HA 
proteins of the present invention. Also provided are expression vectors which 
comprise DNA encoding a HA protein operably linked to transcriptional and 
translational regulatory DNA, and host cells which contain the expression vectors 

The invention provides also provides methods for producing HA proteins which 
1 0 comprises culturing a host cell transformed with an expression vector and causing 
expression of the nucleic acid encoding the HA protein to produce a recombinant 
HA protein. 

The invention also includes vaccines for Haemophilus influenzae infection 
comprising an HA protein for prophylactic or therapeutic use in generating an 
15 immune response in a patient. Methods of treating or preventing Haemophilus 
influenzae infection comprise administering a vaccine. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1 A, IB. and 1C depict the nucleic acid sequence of HA1. 

Figure 2 depicts the amino acid sequence of HA1 . 

20 Figures 3A, 3B, 3C, 3D. 3E, 3F and 3G depict the nucleic acid sequence and amino 
acid sequence of H A2. 
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Figure4showsmeschematicalignmentofHAl andHA2. Regions of sequence 
similarity are indict by shaded, suiped. and open bars, corresponds » 
N.^ldonuins.in^—andC— 

Numbers above the bars refer to amino acid residue positions . the M-fa* 
p^ns. Numbers in parentheses beiow the HA2 bars represent p«- 

— *■* — — dMMtas ^ *; TTTo™ 

domains. Tbe re g ions of HA2 denned b, ami™, acid restdues 5, * » 
846.and 1292 to 1475 shov, mmimal similarity u» amino acids 5 lu. 220 of HA.. 

Figure 5 depicts*, homology be^een the N-terminal amino acid sequences of 
HA1 and HA2. Single leoer abbreviations are used for tie amino acids. A to 
mdicatesidentitybe^theresidu^ 
i.e. similarity between residues. 

.9 , khinsentosnainCMclo^duuopUC^. Ve«or sequences are represent 

icnir^enic^tusedasaprobeforSoutananalysi, The plasmtd pDC60 
whi ch is no, shown, contains the same insert as P DC6»1. but in the oppostte 

orientation. 

Ffcure S shows the identification of plasmid-encoded protein,^ the 
JLriophage T7 expression s,s«m. Bacteria were radiolabeled w.th 
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trans- [ 33 S]-Iabeh and whole cell lysates were resolved on a 7.5% 
SDS-polyacrylamidegel. Proteins were visualized by autoradiography. Lane L 
E. coli BL2l(DE3)/pT7-7uninduced;lane 2, BL21(DE3)/pT7-7 induced; lane 3, 
BL21(DE3)/pDC602 uninduced; lane 4. BL21(DE3)/pDC602 induced; lane 5. 
5 BL21(DE3)/pDC601 uninduced; lane 6, BL21(DE3)/pDC601 induced. The 
plasmids pDC602 and pDC601 are derivatives of pT7-7 that contain the 8.3 kb Xbal 
fragment from pDC400 in opposite orientations. The asterisk indicates the 
overexpressed protein in BL21(DE3)/pDC601. 

Figure 9 depicts the southern analysis of chromosomal DNA from H. influenzae 
10 strains C 54 and 1 1, probing with HA2 versus HA L DNA fragments were separated 
on a 0.7% agarose gel and transferred bidirectionally to nitrocellulose membranes 
prior to probing with either HA J or HA2. Lane 1 . C54 chromosomal DNA digested 
with Bghl\ lane 2, C54 chromosomal DNA digested with C/al; lane 3. C54 
chromosomal DNA digested with Psth lane 4. 1 1 chromosomal DNA digested with 
1 5 fig/II; lane 5. 1 1 chromosomal DNA digested with Clal: lane 6, 1 1 chromosomal 
DNA digested with Xbal . A. Hybridization with the 3.3 kb Pstl-Bglll intragenic 
fragment of HA2 from strain C54. B. Hybridization with the 1.6 kb Styl-Sspl 
intragenic fragment oiHAl from strain 11. 

Figure 10 depicts the comparisonof cellular binding specificities of E. coli DH5ct 
20 harboring HA2 versus HAL Adherence was measured after incubating bacteria 
with eucaryotic cell monolayers for 30 minutes as described and was calculated 
by dividing the number of adherent colony forming units by the number of 
inoculated colony forming units (St. Geme et al.. 1993). Values are the mean ± 
SEM of measurements made in triplicate from representative experiments. The 
25 plasmid pDC601 contains the HA2 gene from H influenzae strain C54. while 
pHM W8-5 contains the HA I gene from nontypable H influenzae strain 1 1 . Both 
pDC601 and pHMW8-5 were prepared using pT7-7 as the cloning vector. 
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Figure 1 1 depicts the comparison of the N-lerminal extremities of HA2. 
HMW* AIDA-1-Tsh.andSepA. TW^-*-^"-^"* 

( Ba^.S J ..^E.Uimnger..992X l om^express.on,^DNAse a ue^ 

i^poseap,— 

JL. W«t .mmun. .«H3.U «W ^ ^- Mu ^ M.A. Schmidt. 
AIDA-1. the adhesin involved in diffuse adherence of the diarrhoeagenic 

tscnencmu rnrtissffl 1994. Isolation 

Mo..Mic M bic 4 .6a53v- 1 546.).Tsh(Pmvence.D.andR.Cu I ussm.l 

L**. - Infec. - «:»69- 13 S0,. and Sep A 

SieUoun-Tonimi, ,. Sansone„. «. C. *~ SepA. Ute ma. 

on the lower line. 

Figurc ,2 depic* me sourhem analysis of chromosomal DNA tVom 
e-Lo^ica,^^ 

strain 1070: lane 10. strain 1076: lane . 1. strain 1084. 
. FigU re 13 depic Bto sou^^ysisofchr OT osorna 1 DNA & omn„»^ 
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separated on a 0. 7% agarose gel. transferred to nitrocellulose, and probed with the 
3.3 kb Pstl-BgM intragenic fragment of hsfftom strain C54. Lane 1, SM4 (type 
a); lane 2, SM72 (type c); lane 3. SM6 (type d); lane 4. Rd (type d); lane 5. SM7 
(type e); lane 6, 142 (type e); lane 7, 327 (type e); lane 8, 35 1 (type e); lane 9. 134 
5 (type f); lane 10, 219 (type 0: lane 1 1, 346 (type 0: lane 12. 503 (type f)- 

Figures 14A and 14B are the nucleic acid sequence of HA3. 

Figure 1 5 is the amino acid sequence of HA3. 

Figures 16A and 16B depict the homology between the amino acid sequences of 
HA I and HA3. Single letter abbreviations are used for the amino acids. A line 
1 0 indicates identity between the residues, and two dots indicateconservativechanges. 
i.e. similarity between residues. 

DETAILED DESCRIPTION OF THE INVENTION 

The present invention provides novel Haemophilus Adhesion (HA) proteins. In 
a preferred embodiment, the HA proteins are from Haemophilus strains, and in the 
1 5 preferred embodiment, from Haemophilus influenza. In particular. H influenzae 

encapsulated type b strains are used to clone the HA proteins of the invention. 
However, using the techniques outlined below. HA proteins from other Haemophilus 
influenzae strains, or from other bacterial species such as Neisseria spp. or 
Bordetalla spp. may also be obtained. 

20 Three HA proteins. HAL HA2 and HA3. are depicted in Figures 2. 3 and 15, 
respectively. HA2 is associated with the formation of surface fibrils, which are 
involved in adhesion to various host cells. HA 1 has also been implicated in adhesion 
to a similar set of host cells. When the HA1 or HA2 nucleic acid is expressed in 
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• f f cnli as described below, the E. coli acquire the ability 
anon -adhe^ is 
to adhere to human host cells. It™ 

refOT ed to as hia (K. *~ > " 

(Haemophilus surface fibrils). 

, . , entified « several ways. A HA nucleic acid or HA protein 

• - c 9 "HA orotein" if the overall homology of the protein 

Fig ure IS -^r^ ^ 80% . ,„ some embodiments the 

65,. a.. ^ .^; 90ioMor08% . ^..proteinthathasa, 
homology^llbe astagh as about 90»9 5 ^ 

ftf HA1 HA2 and HA3 is considered a HA protein. 



25 



sequence. 
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and as between either HA1 or HA3 and HA2 is 74%. As shown in Figure 11. the 
first 24 amino acids of the N-terminus of HA 1 and HA2 has limited homology to 
several other proteins, but this homology is 50% or less. Thus, a HA protein may 
be defined as having homology to the N-terminal region of at least about 60%, 

5 preferably at least about 70%. and most preferably at least about 80%. with 
homology as high as 90 or 95% especially preferred. Similarly, the C-terminal 
region of at least about 75, preferably 100 and most preferably 125 amino acid 
residues is also highly homologous and can be used to identify a HA protein. As 
shown in Figure 16. the homology between the C-terminal 120 or so amino acids 

10 of HA 1 and HA3 is about 98%. and as between either HA1 or HAS and HA2 is 
also about 98%. Thus homology at the C-terminus is a particularly useful way of 
identifying a HA protein. Accordingly, a HA protein can be defined as having 
homology to the C-terminal region of at least about 60%, preferably at least about 
70%. and most preferably at least about 80%. with homology as high as 90 or 95% 

1 5 especially preferred. In a preferred embodiment, the HA protein has homology 
to both the N- and C-terminal regions. 

In addition, a HA protein may be identified as containing at least one stretch of 
amino acid homology found at least in the HA1 and HA2 proteins as depicted in 
Figure 4. HA2 contains three separate stretchs of amino acids ( 1 74 to 608. 847 
20 * to 1291. and 1476 to 1914. respectively) that shows significant homology to the 
region of HA1 defined by amino acids 221 to 658. 

The HA proteins of the present invention have limited homology to the high 
molecular weight protein-1 (HMWl)of// influenzae, as well as the AIDA-I adhesin 
of E. coli. For the HMW 1 protein, this homology is greatest between residues 60- 
25 540 of the HA1 protein and residues 1 100 to about 1550 of HMW 1. with 20% 
homology in this overlap region. For the AIDA-I protein, there is a roughly 50% 
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homology between the first 30 amino acids of AlDA-I and HA1, and the overall 
homology between the proteins is roughly 22%. 

in addition, the HA1. HA2 and HA3 proteins of the present invention have 
homologytoeachother.asshowninFigures4. Sand 16. As between HA1 and 
HA2 the homology is 81% similarity and 72% identity overall. HA3 and HA1 
are 51%identicaland65%similar. Thus, for the purposes of the invention, HA1 
HA2 and HA3 are all HA proteins. 

An "HA1" protein is defined by substantial homology to the sequence shown in 
Figure 2. This homology is preferably greater than about 60%. more preferably 
0 greater than about 70% and most preferably greater than 80%. In preferred 
ero bodimentsthehomologywillbeashighasabout90to95or98%. Similarly. 

an "HA2" protein may be defined by the same substantial homology to the sequence 
shown in Figure 3. and a "HA3" protein is defined with reference to Figure 1 x as 
defined above. 

13 I „ add i,ion.forse q uer«s»hi^ 

proteins sh„«n in Figures 2. 3 and 15, it is understood that l«— * °' 
homology will be determined based on the .umber of homologous amino acds 
in relation to the tota! number of amino acids. Thus, for example, homology of 

20 be determined using the number of amino acids in the shoner sequence. 

HA^of.hepresentinvennonmay^shor^rmand.eaminoacidsequences 

shown in FiguresO. 3 and 15. Thus, in a preferred embodiment, included »uh,n 
^dennitionofHApro^ns^poruonsorfragrnentsof^sequencesho^ 

Figures' 3andl5. Ger^ly.theHAr^uinftagmentsmay range m stzeftom 
, 5 alwut 7aminoacids,o»bou,800aminoac i ds.»imlromabou. 15 ,oabou.700 
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amino acids being preferred, and from about 100 to about 650 amino acids also 
preferred. Particularly preferred fragments are sequences unique to HA; these 
sequences have particular use in cloning HA proteins from other organisms, to 
generate antibodies specific to HA proteins, or for particular use as a vaccine. 
5 Unique sequences are easily identified by those skilled in the art after examination 
of the HA protein sequence and comparison to other proteins; for example, by 
examination of the sequence alignment shown in Figures 5 and 1 6. Preferred unique 
sequences include the N-terminal region of the HA 1 , HA2 and H A3 sequences, 
comprising roughly 50 amino acids and the C-terminal 120 amino acids, depicted 

10 in Figures 2, 3 and 15. HA protein fragments which are included within the 
definitionof a HA protein includeN- or C-terminal truncations and deletions which 
still allow the protein to be biologically active; for example, which still allow 
adherence, as described below. In addition, when the HA protein is to be used to 
generate antibodies, for example as a vaccine, the HA protein must share at least 

1 5 one epitope or determinant with the sequences shown in Figures 2, 3 and 15. In 
a preferred embodiment, the epitope is unique to the HA protein; that is, antibodies 
generated to a unique epitope exhibit little or no cross-reactivity with other proteins. 
However, cross reactivity with other proteins does not preclude such epitopes or 
antibodies for immunogenic or diagnostic uses. By "epitope" or "determinant" 

20 herein is meant a portion of a protein which will generate and/or bind an antibody. 

Thus, in most instances, antibodies made to a smaller HA protein will be able to 
bind to the full length protein. 

In some embodiments, the fragment of the HA protein used to generate antibodies 
are small; thus, they may be used as haptens and coupled to protein carriers to 
25 generate antibodies, as is known in the art. 

- In addition, sequences longer than those shown in Figures 2. 3 and 15 are also 
included within the definition of HA proteins. 
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Prefers the antibodies are generated to a portion of the HA protein which is 
exposedattheoutermemb^ 

of HAl. HA2 and HA3 are believed to be externally exposed proteins. 

Tne HA proteins may also be identified as associated with bactenal adhes.on. Titus, 
deletionsof the HA proteins from the naturally -curing microorganism such as 
//aemop^speciesresultsin a decreaseor absence of binding ability. In some 
embodiments, the expression of the HA proteins in a non-adherent bactena such 
as E. coli results in the ability of the organism to bind to cells. 

In the caseofthenucleicacidtheoverallhomologyofthe nucleic acid sequence 
iscorronensutttewimam^ 

inmegeneticcodeandcc^onbiasofdifferentorgaxusms. Accordingly.the nuclei 
acid sequence homology may be either lower or higher than that of the protein 
sequence. Thus the homology of the nucleic acid sequence as compared to the 
nucleicacidsequencesofFiguresl^andHispreferablygreaterthanabout 40%. 
more preferably greater than about 60% and most preferably greater than 80%. 
In some embodiments the homology will be as high as about 90 to 95 or 98%. 

Asoutlinedfortheprcte^^ 

acids withsubstanualhomology to the unique N-terminal and C-terminalre gl ons 
of the HAl, HA2 and HA3 sequences. 

inoneembodimenuhenuc^^ 

studies. Thus, for example, nucleic acids which hybridize under high stringency 
to all or part of the nucleic acid sequences shown in Figures I, 3 and 14 are 
considered HA protein gene, High stringency conditions include, but are not 
limited to. washes with 0.1XSSC at 65°C for 2 hours. 
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The HA proteins and nucleic acids of the present invention are preferably 
recombinant. As used herein, "nucleic acid" may refer to either DNA or RNA. or 
molecules which contain both deoxy- and ribonucleotides. The nucleic acids include 
genomic DNA, cDN A and oligonucleotidesincluding sense and anti-sense nucleic 

5 acids. Specifically included within the definition of nucleic acid are anti-sense 
nucleic acids. An anti-sense nucleic acid will hybridize to the corresponding non- 
coding strand of the nucleic acid sequences shown in Figures 1. 3 and 1 4, but may 
contain ribonucleotides as well as deoxyribonucleotides. Generally, anti-sense 
nucleic acids function to prevent expression of mRNA, such that a HA protein is 

1 0 not made, or made at reduced levels. The nucleic acid may be double stranded, 
single stranded, or contain portions of both double stranded or single stranded 
sequence. By the term "recombinant nucleic acid" herein is meant nucleic acid, 
originally formed in vitro by the manipulation of nucleic acid by endonucleases. 
in a form not normally found in nature. Thus an isolated HA protein gene, in a linear 

1 5 form, or an expression vector formed in vitro by ligating DNA molecules that are 
not normally joined, are both considered recombinant for the purposes of this 
invention: i.e. the HA nucleic acid is joined to other than the naturally occurring 
Haemophiluschromosome in which it is normally found. It is understood that once 
a recombinant nucleic acid is made and reintroduced into a host cell or organism. 

20 it will replicate non-recombinantly.i.e. using the in vivo cellular machinery of the 
host cell rather than in vitro manipulations; however, such nucleic acids, once 
producedrecombinantly.althoughsubsequentlyreplicatednon-recombinantly.are 

still considered recombinant for the purposes of the invention. 

Similarly, a "recombinantprotein" is a protein made using recombinant techniques, 
i.e. through the expression of a recombinant nucleic acid as depicted above. A 
recombinant protein is distinguished from naturally occurring protein by at least 
one or more characteristics. For example, the protein may be isolated away from 
some or all of the proteins and compounds with which it is normally associated 
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intewild^bosuorfouodtoteabsenceof^h^cclUtensclvcs-n,^. 

k|nk « b.pa^.yorsubsunW.y purify mdeftninoninelt^the 
^ionofaHApro^f^oa.org^smmadiff^orgamsmorhos.ce... 

Altt nuUve 1 ».tepr^inmayb«n^a,as i gninca„Uyhi g h« C onc e n m u»n*a n 
is normal.y seen, through the use of a inducible promoter or high express 
promoter, such that the P— - mad. a, incased concemrauon .eve*. 

addition of an epitope tag or amino acid substitution,, insertions and demons. 
Fu ^err«,re.a..l»u^ 

of pro«eins «hich . synthesized chemically, using the sequence informal of 
Figures 2. 3 and 15. are considered recombinant herein as well. 

Also included w«h tie definition of HA protein are HA proteins from other 
organisms, which are cloned and expressed as outlined below. 

,„ ,h. case of anti-sense nucleic acids, an ami-sense nucleic acid is defined as one 
which w ill hvbridize ,0 all or pan of the corresponding non-coding sequence of 
„,ese,uer^show™Figuresl.3andl4. Oenca.lv. the hybridizauon cordons 
used for the determination of anti-sense hybridization wil. be high suingency 
conditions, such as 0.1XSSC at 65°C. 

OKetheHAproteinnudeic^disidentiSed. it can be donedand. if necessary. 
, 0 i.scor^i.uentP^recombined^formmeenureHAp^uMnnuc.eicac.d. Once 

" isoUted from to naura. source, contained within a plasmid or other vector 
orexcisedmereftomasa.inearnucUicacidsegmen.merecombinan.HApro^ 

nucieicacidcanbe.^usedasapn.beK.idennftandiso.a.eomerHAp^ 
nucleic acids. It can also be used as a "precursor" nucleic acid to make modified 
25 or variant HA protein nucleic acids and proteins. 
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Using the nucleic acids of the present invention which encode HA protein, a variety 
of expression vectors are made. The expression vectors may be either self- 
replicating extrachromosomal vectors or vectors which integrate into a host genome. 
Generally, these expression vectors include transcriptional and translational 

5 regulatory nucleic acid operably linked to the nucleic acid encoding the HA protein. 
"Operably linked" in this context means that the transcriptional and translational 
regulatory DNA is positioned relative to the coding sequence of the HA protein 
in such a manner that transcription is initiated. Generally, this will mean that the 
promoter and transcriptional initiation or start sequences are positioned 5' to the 

1 0 HA protein coding region. The transcriptionaland translational regulatory nucleic 
acid will generally be appropriate to the host cell used to express the HA protein; 
for example, transcriptional and translational regulatory nucleic acid sequences from 
Bacillus will be used to express the HA protein in Bacillus. Numerous types of 
appropriate expression vectors, and suitable regulatory sequences are known in the 

15 art for a variety of host cells. 

In general, the transcriptional and translational regulatory sequences may include, 
but are not limited to, promoter sequences, leader or signal sequences, ribosomal 
binding sites, transcriptional start and stop sequences, translational start and stop 
sequences, and enhancer or activator sequences. In a preferred embodiment, the 
20 regulatory sequences include a promoter and transcriptional start and stop sequences. 

Promoter sequences encode either constitutive or inducible promoters. The 
promoters may be either naturally occurring promoters or hybrid promoters. Hybrid 
promoters, which combine elements of more than one promoter, are also known 
in the art. and are useful in the present invention. 

25 In addition, the expression vector may comprise additional elements. For example, 
the expression vector may have two replication systems, thus allowing it to be 



WO 96/30519 



-18- 



PCT/US96/04031 



10 



maintained in two organisms, for example in mammalian or insect cells for 
expressions in a procaryotichost for cloning and amplification. Furthermore, 
for integrating expression vectors, the expression vector contains at least one 
sequence homologous to the host cell genome, and preferably two homologous 
sequences which flank the expression construct The integrating vector may be 
directedtoaspecificl^ 

sequence for inclusion in the vector. Constructs for integrating vectors are well 
knoWn in the art. 

Inaddition-inaprefenedembommenumeexpressionvectorcontainsaselectab^ 
marker gene to allow the selection of transformed host cells. Selection genes are 
well known in the art and will vary with the host cell used. 



The HA proteins of the present invention are produced by culturing a host cell 
transformed with an expression vector containing nucleic acid encoding a HA 
protein, under the appropriate conditions to induce or cause expression of the HA 
, 5 protein. The conditions appropriate for HA protein expression will vary with the 
choice of the expression vector and the host cell, and will be easily ascertained by 
one skilled in the art through routine experimentation. For example, the use of 
constitutivepromotersin the expression vector will require optimizing the growth 
and proliferationof the host cell, while the use of an inducible promoter requires 
,0 meappropriategrowmconditionsforinduction. In addition, in some embodiments, 
the timingof the harvest is important. For example, the baculoviral systems used 
in insect cell expression are lytic viruses, and thus harvest time selection can be 
crucial for product yield. 

Appropriate host cells include yeast, bacteria, archebacteria. fungi, and insect and 
,5 animal cells, including mammalian cell, Of particular interest are Dmsopto 
m£lan£asl£I cells, SBTTlmnnnYrnmcvi^ and other yeasts, E^ll, 
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subtilis. SF9 cells. CI 29 cells. 293 cells,Neurospora.BHK, CHO, COS, and HeLa 
cells, immortalized mammalian myeloid and lymphoid cell lines. 

In a preferred embodiment. HA proteins are expressed in bacterial systems. 
Bacterial expression systems are well known in the art. 

5 A suitable bacterial promoter is any nucleic acid sequence capable of binding 
bacterial RNA polymerase and initiating the downstream (3') transcription of the 
coding sequence of HA protein into mRNA. A bacterial promoter has a transcription 
initiation region which is usually placed proximal to the 5' end of the coding 
sequence. This transcription initiation region typically includes an RNA polymerase 

10 binding site and a transcription initiation site. Sequences encoding metabolic 
pathway enzymes provide particularly useful promoter sequences. Examples include 
promoter sequences derived from sugar metabolizing enzymes, such as galactose, 
lactose and maltose, and sequences derived from biosynthetic enzymes such as 
tryptophan. Promoters from bacteriophage may also be used and are known in the 

15 art. In addition, synthetic promoters and hybrid promoters are also useful: for 
example, the tac promoter is a hybrid of the trp and lac promoter sequences. 
Furthermore, a bacterial promoter can include naturally occurring promoters of non- 
bacterial origin that have the ability to bind bacterial RNA polymerase and initiate 
transcription. 

20 

In addition to a functioning promoter sequence, an efficient ribosome binding site 
is desirable. In E. colL the ribosome binding site is called the Shine-Delgamo(SD) 
sequence and includes an initiation codon and a sequence 3-9 nucleotides in length 
located 3-11 nucleotides upstream of the initiation codon. 

25 ' The expression vector may also include a signal peptide sequence that provides 
for secretion of the HA protein in bacteria. The signal sequence typically encodes 
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asi^p.p^comprisedofhydrophobic^n.^dswhichdi^^s^uon 

^te^mediaCpam^Wvc^orimotepenpUsnucs^loc^ 
between the tamer and ou<er membrane of the cell (jpam-negative bactena). 

ge^sinc.udeg^whichrento^bac.^aresi^.^dn^suchas^p.ata 
chl „ t amphemco 1 ^mycin.^ycir.neomycin^> W Ur,.Se.ec B b te 

maricersalso include biosynthetic genes, such as those in the histidir* tryptophan 
10 and leucine biosynthetic pathways. 

Thes.comr^tsareassemb.edmu.expression^aors. Expression vectors fo, 
bacteriaare wel. known in the an. and include vectors for Bacillus sMis. £ col,, 
arorocoa** tremor*, and Srrepmcoccs IMdam, among others. 

The bacterial expression vectors are transformed into bacterial host cells using 
techniques well known in me an. such as calcium chloride treatment 

electroporation. and others. 

, In „„eembodimen,HApro.einsarep ro ducedininsec,ceUs. Expression vectors 
forthet^orrr^onofinsectcelKar^^ 

vectors arewellknowninthean. Briefly, baculovirusis a very large DN A virus 
^chprod^itscoa,prc«i„*ve*r^^^ 

genome, exogenous genes must be placed in the viral genome by recombinauon. 
A ccording,y.u,.c„mponen,sofmeexpr.s.on system indude: a transfer vector. 
usuaUv a bacteria, plasmid. which contains both a fragment of the baculovtrus 

genome, and a convenient * »» * ** ' "* 

.vpebaculovimswimaseouer^r.mologous^mebaculovi^ 



15 



20 



25 



WO 96/30519 



-21- 



PCT/US96/04031 



in the transfer vector (this allows for the homologous recombination of the 
heterologous gene into the baculovirus genome); and appropriate insect host cells 
and growth media. 

Mammalian expression systems are also known in the art and are used in one 
5 embodiment. A mammalian promoter is any DNA sequence capable of binding 
mammalian RNA polymerase and initiating the downstream (3') transcription of 
a coding sequence for HA protein into mRNA. A promoter will have a transcription 
initiating region, which is usually place proximal to the 5' end of the coding 
sequence, and a TATA box. using a located 25-30 base pairs upstream of the 

1 0 transcription initiation site. The TATA box is thought to direct RNA polymerase 
II to begin RNA synthesis at the correct site. A mammalian promoter will also 
contain an upstream promoter element, typically located within 100 to 200 base 
pairs upstream of the TATA box. An upstream promoter element determines the 
rate at which transcription is initiated and can act in either orientation. Of particular 

15 use as mammalian promoters are the promoters from mammalian viral genes, since 
the viral genes are often highly expressed and have a broad host range. Examples 
include the SV40 early promoter, mouse mammary tumor virus LTR promoter, 
adenovirus major late promoter, and herpes simplex virus promoter. 

Typically, transcription termination and polyadenylation sequences recognized by 
20 mammalian cells are regulatory regions located 3' to the translation stop codon and 
thus, together with the promoter elements, flank the coding sequence. The 3* 
terminusof the mature mRNA is formed by site-specificpost-translationalcleavage 
and polyadenylation. Examples of transcription terminator and polyadenlytion 
signals include those derived form SV40. 

25 The methods of introducing exogenous nucleic acid into mammalian hosts, as well 
as other hosts, is well known in the art. and will vary with the host cell used. 
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protein, using cassette mutagenesis or other techniques well known in the art, to 
produce DNA encoding the variant, and thereafter expressing the DNA in 
recombinant cell culture as outlined above. However, variant HA protein fragments 
having up to about 100- 150 residues may be prepared by in vittP synthesis using 
5 established techniques. Amino acid sequence variants are characterized by the 
predeterminednature of the variation, a feature that sets them apart from naturally 
occurring allelic or interspecies variation of the HA protein amino acid sequence. 
The variants typically exhibit the same qualitative biological activity as the naturally 
occurring analogue, although variants can also be selected which have modified 
1 0 characteristics as will be more fully outlined below. 

While the site or region for introducing an amino acid sequence variation is 
predetermined, the mutation per se need not be predetermined. For example, in 
order to optimize the performance of a mutation at a given site, random mutagenesis 
may be conducted at the target codon or region and the expressed HA protein 

1 5 variants screened for the optimal combination of desired activity. Techniques for 
making substitution mutations at predetermined sites in DNA having a known 
sequence are well known, for example. M 1 3 primer mutagenesis. Screening of the 
mutants is done using assays of HA protein activities; for example, mutated HA 
genes are placed in HA deletion strains and tested for HA activity, as disclosed 

20 herein. The creation of deletion strains, given a gene sequence, is known in the 
art. For example, nucleic acid encoding the variants may be expressed in an 
adhesion deficient strain, and the adhesion and infectivity of the variant 
Haemophilus influenzae evaluated. For example, as outlined below, the variants 
may be expressed in the E coli DH5a non-adherent strain, and the transformed 

25 E coli strain evaluated for adherence using Chang conjunctival cells. 

* Amino acid substitutions are typically of single residues: insertions usually will 
be on the order of from about 1 to 20 amino acids, although considerably larger 
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i„ some cases deleoons may * -* * fo ' ^ "* °"" 

domains of the HA protein is deleted. 

a, a final derivative. General* these changes are done on a few amino ac.ds to 

in certain circumstances. 

When small alterations in the characuristics of the HA protein are desired, 
substitutions are genendly made in accordance win, the following chart: 



^rigini*' Residue 



Chart I 
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Substantial changes in function or immunological identity are made by selecting 
substitutions that are less conservative than those shown in Chart I. For example, 
substitutions may be made which more significantly affect: the structure of the 
polypeptide backbone in the area of the alteration, for example the alpha-helical 

5 or beta-sheet structure; the charge or hydrophobicity of the molecule at the target 
site; or the bulk of the side chain. The substitutions which in general are expected 
to produce the greatest changes in the polypeptide's properties are those in which 
(a) a hydrophilic residue, e.g. seryl or threonyl. is substituted for (or by) a 
hydrophobic residue, e.g. leucyl, isoleucyl, phenylalanyl, valyl or alanyl: (b) a 

10 cysteine or proline is substituted for (or by) any other residue; (c) a residue having 
an electropositive side chain, e.g. lysyl. arginyl, or histidyl, is substituted for (or 
by) an electronegative residue, e.g. glutamyl or aspartyl; or (d) a residue having 
a bulky side chain, e.g. phenylalanine, is substituted for (or by) one not having a 
side chain, e.g. glycine. 

1 5 The variants typically exhibit the same qualitativebiological activity and will elicit 
the same immune response as the naturally-occurringanalogue. although variants 
also are selected to modify the characteristics of the polypeptide as needed. 
Alternatively, the variant may be designed such that the biological activity of the 
HA protein is altered. For example, the Walker box ATP-binding motif may be 

20 altered or eliminated. 

In a preferred embodiment, the HA protein is purified or isolated after expression 
HA proteins may be isolated or purified in a variety of ways known to those skilled 
in the art depending on what other components are present in the sample. Standard 
purification methods include electrophoretic. molecular, immunological and 
25 chromatographic techniques, including ion exchange, hydrophobic, affinity, and 
reverse-phase HPLC chromatography, and chromatofocusing. For example, the 
HA protein may be purified using a standard anti-HA antibody column. 
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Ultrafiltrationanddiafiltiation techniques, in conjunction wiui protein concentiaoon, 
are also useful. For general guidance in suitable purification techniques, see Scopes, 
R ProteinPurification,Springer-Verhg,NY(1982). The degree of purification 
necessary will vary depending on the use of the HA protein. In some instances no 
purification will be necessary. 

Once expressedand purified if necessary, the HA proteins are useful in a number 

of applications. 

For example, the HA proteins can be coupled, using standard technology, to affinity 
chromatography columns. These columns may then be used to purify antibodies 
from samples obtained from animals or patients exposed to the Haemophilus 
influenzaeo^m. The purified antibodies may then be used as outiined below. 



Additionally.the HA proteinsare useful to make antibodiesto HA proteins. These 
antibodies find use in a number of apphcafons. The antibodies are used to diagnose 
the presenceof an Haemophilus influenzae infection in a sample or patient. In a 
15 preferredembodimenumeantibodiesareusedtodetectmepre^ 

HaemophilusinfluenzaQXmi) . although typable//. influenzae infections are also 

detected using the antibodies. 

This diagnosis will be done using techniques well known in the art: for example, 
samplessuchasblc^ortissuesamplesmaybeobtamedfromapatient^ 

,0 for reactivity with the antibodies, for example using standard techniques such as 
ELISA In a preferred embodiment, monoclonal antibodies are generated to the 
HA protein, using techniques well known in the art. As outlined above, the 
antibodiesmay be generatedto the full length HA protein, or a pomon of the HA 

- protein. 
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Antibodies generated to HA proteins may also be used in passive immunization 
treatments, as is known in the art 

Antibodies generated to unique sequences of HA proteins may also be used to screen 
expression libraries from other organisms to find, and subsequently clone, HA 
5 nucleic acids from other organisms. 

In one embodiment the antibodies may be directly or indirectly labelled. By 
"labelled" herein is meant a compound that has at least one element, isotope or 
chemical compound attached to enable the detection of the compound. In general, 
labels fall into three classes: a) isotopic labels, which may be radioactive or heavy 
10 isotopes; b) immune labels, which may be antibodies or antigens; and c) colored 
or fluorescent dyes. The labels may be incorporated into the compound at any 
position. Thus, for example, the HA protein antibody may be labelled for detection, 
or a secondary antibody to the HA protein antibody may be created and labelled. 

In one embodiment, the antibodies generated to the HA proteins of the present 
1 5 inventionare used to purify or separate HA proteins or the Haemophilus influenzae 
organism from a sample. Thus for example, antibodies generated to HA proteins 
which will bind to the Haemophilus influenzae organism may be coupled, using 
standard technology, to affinity chromatography columns. These columns can be 
used to pull out the Haemophilus organism from environmental or tissue samples 

20 In a preferred embodiment, the HA proteins of the present invention are used as 
vaccines for the prophylactic or therapeutic treatment of a Haemophilus influenzae 
infection in a patient By "vaccine" or "immunogenic compositions" herein is meant 
an antigen or compound which elicits an immune response in an animal or patient 
The vaccine may be administered prophylactically. for example to a patient never 

25 previously exposed to the antigen, such that subsequent infection by the 
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hMkltta. organism is prevented. Mternaively, the vacctne 
be adminis.ered.he^cally.oapauen.^.-sly exposed or infected bvte 

M-^Mt. • -P— • «* tafK,i0n CaM " >l * PrcVe °" d - " 

^anin^.uneresponseisgene^whichallowstep.nenfs immune system 

tomoreenecdveiycombatteinf^ T hu,for«xM.ple,*eremaybeadecrease 
or lessening of the symptoms associated with infection. 

aninudsandorganisms. Thus the meu»ds are applicable to bod, human therapy 

and veterinary applications. 

The adnunisn.no. of the HA protein as a vaccine is done in a varied of ways. 
Oenerally. the HA proKins can be formulated according to known meftods to 
p^pharm^nncanynsefulcompositions. whereby merapeuucaily effecnve 
amounts of the HA protein are combined in admixture with a pharmaceutical., 
acceptablecarriervehicte-Suiutblevehiclesar^meirfonnulanonarew 

i„ the an. Such expositions will contain an effective amount of te HA protem 
.ogether with a suitable amount of vehicle in omer . prepare pharmaceuucally 
aeceptablecomposiuonsforeffemveadmimsnationto the host. The composite 
mav include salts, buffers, carrier pnaeins such as serum albumin, targeung 
roolecules to localize the HA protein at the appropriate site or tissue wtthtn the 

,„oneembomn^td«v B cir*U»toinis^asasingled^;u U ,is.o„ed 0 se 
„ adequate to induce a sufficient immune response to prophylactically or 
U^u.icallyo«.a/.n^^ 

ft. vaccine is adminisKred as seveml doses over a period of time, as a prunaty 
25 vaccination and "booster" vaccinations. 



15 



20 



WO 96/30519 



-29- 



PCI7US96/04031 



By "therapeutically effective amounts" herein is meant an amount of the HA protein 
which is sufficientto induce an immune response. This amount may be different 
depending on whether prophylactic or therapeutic treatment is desired. Generally, 
this ranges from about 0.001 mg to about 1 gin, with a preferred range of about 0.05 
5 to about .5 gm. These amounts may be adjusted if adjuvants are used. 

The following examples serve to more fully describe the manner of using the above- 
described invention, as well as to set forth the best modes contemplated for carrying 
out various aspects of the invention. It is understood that these examples in no way 
serve to limit the true scope of this invention, but rather are presented for illustrative 
1 0 purposes. All references cited herein are specifically incorporated by reference. 

EXAMPLE 1 
Cloning of HA1 

Many protocols are substantially the same as those outlined in St. Geme et al.. Mol. 
Microbio. 15(l):77-85 (1995). 

15 Bacterial str ains- plasmids. and phages. 



Nontypable H. influenzae strain 1 1 was the clinical isolate chosen as a prototypic 
HMWl/HMW2-non-expressingstrairu although a variety of encapsulated typabk 
strains can be used to clone the protein using the sequences of the figures. The 
organism was isolated in pure culture from the middle ear fluid of a child with acute 
20 otitis media. The strain was identified as H. influenzae by standard methods and 
was classified as nontypable by its failure to agglutinate with a panel of typing 
antisOTfwtt Mi«iBa*W 

Park. N.C.) and failure to show lines of precipitation with these antisera in 
counterimmunoelectrophoresis assays. Strain 11 adheres efficiently to Chang 



WO 9630519 



-30- 



PCTAJS96/04031 



coniunc.iva.celb/nvi^.a.^comp^Hcu.Aosepre.i^ydemonsuattd 
fo, NTH1 stains expressing HMWl«MW2Jike proteins (data no. shown). 
Convalescent serum from the child infected with .his smtin demounted an 
amibodyresr^edirecuxlpred™^^ 
5 weight proteins with molecular weights greater than 100 kDa. 

' M 13mpl8andM13mpl9»^otaainedtanNewE»g^^^ 
M ass)pT7-7v^tj*.indginofS.anleyTabor.T^ 
polymerasepn>moter400.aribosome^^ 
for the T7 gene 10 protein upstream from a multiple cloning site. 

io Mnlt r- 1 "- ''" l ing *•"* r'"- 1111 s"I»I°imh. 

The recombinant phage containing the HA1 gene was isolaed and charccterized 
usingmemodssirmlar.omosedescnbed^^^ 

tams.rainl.waspreparedandW.panialresmctiondigesuoftheDNAw^ 
prepared and fractionated on 0.7% agarose gels. Fractions containing DNA 
foments in the 9- to 20- kbp range were pooled, and a libnuy was prepared by 
hgation into JLEMBL3 arms. Ligation mixtures were packaged in W» 
Gigapack (S-ratagene) and plate-amplified in a P2 lysogen of £ col, LE392. 
t^boaplaaueinummologicals^^ 

Press For plasmid subcloning studies. DNA from recombinant phage was 
manipulation of cloned DNA as described by Maniatis e. a. (supra). 

Plasmid PHMW8-3 was genemted by isoiating an 1 1 kbp Xba. fragment from 
purified DNAftomrecombinantphageclone 1 1-17 and ligating intoXbal cutpT7-7 
, 5 PlasmidpHMWM was generated by isolating a 10 kbp HMO* cut P T7-7. 
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Plasmid pHM W8-5 was generated by digesting plasmid pHMW8-3 DNA with Clal 
isolating the larger fragment and religating. Plasmid pHMW8-6 was generated by 
digesting pHMW8-4 with SpeL which cuts at a unique site within ihcHAJ gene, 
blunt-ending the resulting fragment, inserting a kanamy cin resistance cassette into 
5 the Spel site. Plasmid pHM W8-7 was generated by digesting pHM W8-3 with Nrul 
and Hindlll isolating the fragment containing pT7-7, blunt-ending and religating. 
The plasmid restriction maps are shown in Figure 6. 

DNA sequence analysis. 

DNA sequence analysis was performed by the dideoxy method with the U.S. 
10 Biochemicals Sequenase kit as suggested by the manufacturer. [*S]dATP was 
purchased from New England Nuclear (Bostoa Mass). Data were analyzed with 
Compugene software and the Genetics Computer Group program from the 
University of Wisconsin on a Digital VAX 8530 computer. Several 21-mer 
oligonucleotide primers were generated as necessary to complete the sequence. 

15 Adherence assays. 

Adherence assays were done with Chang epithelial cells [Wong-Kilbourne 
derivative, clone 1 oc-4 (human conjunctiva). ATCC CCL20.2)]. which were seeded 
into wells of 24-well tissue culture plates, as described (St. Geme HI et al.. Infect. 
Immun. 58:4036 ( 1 990)). Bacteria were inoculated into broth and allowed to grow 

20 to a density of approximately 2 x 1 0 9 colony-formingunits per ml. Approximately 
2 x 1 0 7 colony-formingunits were inoculated onto epithelial cells monolayers, and 
plates were gently centrifuged at 165 x g for 5 min to facilitate contact between 
bacteria and the epithelial surface. After incubation for 30 min at 3 7°C in 5% CO : . 
monolayers were rinsed five times with phosphate buffered saline (PBS) to remove 

25 nonadherent organisms and were treated with trypsin-EDTA (0.05% trypsin/0.5% 
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EDTA) in PBS to release them from the plastic support. Well contents were 
agitated,and dilution wereplated on solid medium to yield the numberof adherent 
bacteria per monolayer. Percent adherence was calculated by dividing the number 
of adherent colony-foiming units per monolayer by the number of inoculated colony- 
5 forming units. 

H^HiiHi n T l 1 i < .t, arMe teri Z n tii ip^">ron,bi M iiti>hape exprwsinpthe strain 1 1 
hfr h ipnl^-nlflr « yj c iit adhesion nrotein, 

ThenontypableHaemop«/i«my/«enzaestrainll chromosomal DNA library was 
screened immunologically with convalescent serum from the child infected with 
10 strain 1 1 . Immunoreactive clones were screened by Western blot for expression 
ofhighmolecularweightprotemswta^ 

two different classes of recombinant clones were recovered. A single clone 
designated 11-17 was recovered which expressed the HA 1 protein. The recombinant 
protein expressedby this clone had an apparent molecular weight of greater than 
15 200 kDa. 

Trffllfffrr 1 ™ 1 ™" ' nto E - co » 

Plasmids were introduced into DH5a strain of E. coli (Maniatis, supra), which is 
a non-adherent strain, using electroporation (Dower et al., Nucl. Acids Res. 16:6127 
( 1 988). The results are shown in Table 1 . 
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Strain 


% Adherence* 


DH5a(pHMW 8-4) 


43.3 ±5.0% 


DH5a(pHMW 8-5) 


41.3 ±3.3% 


DH5a(pHMW 8-6) 


0.6 ±0.3% 


DH5a(pHMW 8-7) 




DH5a(pT7-7) 


0.4 ±0.1% 



'Adherence was measured in a 30 minute assay and was calculated by dividing the 
number of adherent bacteria by the number of inoculated bacteria. Values are the 
1 0 mean ± SEM of measurements made in triplicate from a representati veexperiment 

In addition, a monoclonal antibody made by standard procedures, directed against 
the strain 1 1 protein recognized proteins in 57 of 60 epidemiologically-unrelated 
NTHI. However. Southern analysis using the gene indicated that roughly only 25% 
of the tested strains actually hybridized to the gene (data not shown). 

15 EXAMPLE 2 

Cloning of HA2 

In a recent study we examined a series of H. influenza type b isolates by 
transmission electron microscopy and visualized short, thin surface fibrils distinct 
from pili (St. Geme. J.W.III. and D. Cutter. 1995. Evidence that surface fibrils 
20 expressed by Haemophilus influenzae type b promote attachment to human epithelial 
cells. Mol. Microbiol. 15:77-85.). In that study, the large genetic locus involved 
in the expression of these appendages was isolated. 

Bacterial strains and plasmids 
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H. influenzae strain C54 is a type b strain that has been described previously 
(Pichichero. ML. P. Anderson, M. Loeb. and D.H. Smith. 1982. Do pili play a 
role in pathogenicity of Haemophilus influenzae type b? Lancet. n:960-962.). Strain 
C54-Tn400.23 is a mutant that contains a mini-Tn/0 kan element in the faflocus 
and demonstrates minimal in vitro adherence (St. Geme. J.W.III. and D. Cutter. 
1995. Evidence that surface fibrils expressed by Haemophilus influenzae type b 
promoteattactarcnttohu^ 

1053. 1058. 1060. 1063. 1065. 1069, 1070, 1076, 1081. and 1084 are H influenzae 
type b isolates generously provided by J. Musser (Baylor University. Houston. 
Texas) (Musser et al.. 1990. Global genetic structure and molecular epidemiology 
of encapsulated Haemophilus influenzae. Rev. Infect. Dis. 12:75-111.) H. 
influenzae strains SM4 (type a). SM6 (type d). SM7 (type e). and SM72 (type c) 
are type strains obtained from R. Facklam at the Centers for Disease Control 
( Atlanta-Georgia). Strains 142. 327. and 351 are H. influenzae type e isolates. and 
strains 134, 219, 256. and 501 are H. influenzae type f isolates obtained from H. 
Kavhty(FirmishNation a lPublicHealthInstitute.Hel S inki). Strain Rd (typed) and 
the 15 nontvpable isolates examined by Southern analysis have been described 
piously (Alexander et al.. J. Exp. Med. 83:345-359 (1951 ): Barencamp et al.. 
Infect. Immun. 60: 1 302- 1313(1 992)). E. coli DH5a is a nonadherent laboratory 
strainthatwasoriginallyobtainedfromGibcoBRL. £ coli strain BL2 1 (DE3 ) was 
a gift from F.W. Studier and contains a single copy of the T7 RN A polymerase gene 
under the control of the lac regulatory system (Studier. F.W.. and B.A. Moffatt. 
1986. Use of bacteriophage T7 RNA polymerase to direct high-level expression 
of cloned genes. J. Mol. Biol. 189:1 13-130.). Plasmid P T7-7 was provided by S. 
Tab<>randcontains*eT7RNAr»lyme^ 

and the translational start site for the T7 gene 10 protein upstream from a multiple 
cloning site (Tabor. S.. and CJZ. Richardson. 1985. A bacteriophage T7 RNA 
polymerase/promotersy^ 

Proc Natl. Acad. Sci. USA. 82:1074-1078.). P UC19 is a high-copy-number plasmid 
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that has been previously described (Yanish-Perronet aL Gene 33 : 1 03-1 1 9 ( 1 985)). 
pDC400 is a pUC 19 derivative that harbors the H. influenzae strain C54 surface 
fibril locus and is sufficient to promote in vitro adherence by laboratory strains of 
£. coli (St. Geme. J.W.III. and D. Cutter. 1995. Evidence that surface fibrils 

5 expressed by Haemophilus influenzae type b promote attachment to human epithelial 
cells. Mol. Microbiol. 15:77-85.). pHMW8-5 is a pT7-7 derivative that contains 
the H. influenzae strain 1 1 hia locus and also promotes adherence by nonadherent 
laboratory strainsof E. coli (Barenkamp.S.J.. and J. W. St. Geme. III. Identification 
of a second family of high molecular weight adhesion proteins expressed by 

10 nontypable Haemophilus influenzae. Mol. Microbiol., in press.). pHMW8-6 
contains the H. influenzae hia locus interrupted by a kanamycin cassette 
(Barenkamp.S.J., and J. W. St. Geme. III. Identificationof a second family of high 
molecular weight adhesion proteins expressed by nontypable Haemophilus 
influenzae. Mol. Microbiol., in press.). pUC4K served as the source of the 

1 5 kanamycin-resistancegene that was used as a probe in Southern analysis (Vieira. 
J., and J. Messing. 1982. The pUC plasmids. an M13mp7-derived system for 
insertion mutagenesis and sequencing with synthetic universal primers. Gene. 
19:259-268.). 

Culture conditions 

20 H. influenzae strains were grown on chocolate agar supplemented with 1% Isovitale 
X. on brain heart infusion agar supplemented with hemin and NAD (BHI-DB agar), 
or in brain heart infusion broth supplemented with hemin and NAD (BHls) 
(Anderson. P.. R.B. JohnstonJr.. and D.H. Smith. 1972. Human serum activity 
against Haemophilus influenzae npe b. J. Clin. Invest. 51:31-38.). These strains 

25 were storedat -80°C in brain heart infusion broth with 25% glycerol. E. coli strains 
were grown on Luria Bertani (LB) agar or in LB broth and were stored at -80°C 
in LB broth with 50% glycerol. For H. influenzae, kanamycin was used in a 
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concentration of 25 mg/ml. Antibiotic «— — * £ «° **** ** 
following: ampicillin or carbeniciUin .00 mg/ml and kamunycin 50 mg/ml. 

Induction of plasmid-encoded proteins 

Toidennfypla^d^^pro^tel^ophageTTexp^ionv^rpTT. 

was employed and the Levant P T7-7 derivaives were reformed m ,„ £ col. 
Bfl(DE3) ActivauonofnKT^promo.erwas^hie.edbyinducingexpress.a. 
„f T7 RNA polymerase with isopropyl-b-D-AiogalaCopyranoside (final 
concentration.. mM). AftoW^onforSOn^arSrC.nfanvpicinwasadded 
t0 a final concentrator, of 200 mg/tn.. Thirty mintnes .ate, 1 ml of culture was 
pul sedw,ft50mCiof O »ns.r'Sl-labe.(lCN.Irvine.Ca.if.)for5minu K s. Bactena 

wre harvested, and whole cel. lysaes were ..suspended in Laentml, buffer or 
a^tysis by sodium dodeey. smfae-nolyacrytanide gel elecuophoresis on 7.5/. 

assembly of me bead of bacteriophage T4. Nature (London). 227:680-685 ). 
Autoradiography was performed with Kodak XAR-5 film. 

Recombinant DNA methods 

DNA ligauons-restricUonendonuclease digesuons. and gel electrophorests were 
permed according to standard technics (Sambrool, ... 6P. Much, and T. 

Uta«v. Cold Spring Harbor. N.Y.). Pernios w*re induced into £ «* 
su.insbvei.herc.Krmcal.rans^^ 

WJ JF MiUer.arK.C.W.Ragsdale.^gS.Highefficie.tcyuansfonnauonoff 
c o,,bvhighvo^^^^ 

I E F Flilsch . and T. Maniaus. 1989. Molecular cloning: a laboratory manual. 
. i„« ^^wasperformedusingU.eM.VmemodofHerrionetaUHernon. 
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R.M.. E.M. Meyer, and M. Vogt 1970. Defined nongrowth media for stage II 
competence in Haemophilus influenzae. J. Bacterid. 101:517-524.). 

Adherence assays 

Adherence assays were performed with tissue culture cells which were seeded into 
5 wells of 24- well tissue culture plates as previously described (St Geme et al.. Infect. 
Immun. 58:4036-4044(1991)). Adherence was measured after incubating bacteria 
with epithelial monolayers for 30 minutes as described (St Geme, J.W.IIL S. 
Falkow.and SJ. Barenkamp. 1993.High-molecular-weightproteinsof nontypable 
Haemophilus influenzae mediate attachment to human epithelial cells. Proc. Natl. 

1 0 Acad. Sci. U.S.A. 90:2875-2879.). Tissue culture cells included Chang epithelial 
cells ( Wong-Kilbournederivati ve.clone 1 -5c-4 (human conjuncti va))( ATCC CCL 
20.2), KB cells (human oral epidermoid carcinoma) (ATCC CCL 1 7), HEp-2 celts 
(human laryngeal epidermoid carcinoma) (ATCC CCL 23). A549 cells (human 
lung carcinoma) (ATCC CCL 1 85). Intestine 407 cells (human embryonic intestine) 

1 5 (ATCC CCL 6). HeLa cells (human cervical epitheloid carcinoma) (ATCC CCL 
2). ME-1 80 cells (human cervical epidermoid carcinoma) (ATCC HTB 33). HEC-IB 
cells (human endometrium) (ATCC HTB 1 13). and CHO-K1 cells (Chinese hamster 
ovary ) (ATCC CCL 6 1 ). Chang. KB. Intestine 407. HeLa. and HEC-IB cells were 
maintained in modified Eagle medium with Earle's salts and non-essential amino 

20 acids. HEp-2 cells were maintained in Dulbecco's modified Eagle medium. A549 
cells and CHO-K1 cells in F12 medium (Ham), and ME- 180 cells in McCoy 5 A 
medium. All media were supplemented with 10% heat-inactivated fetal bovine 
serum. 

Southern analysis 

25 Southern blotting was performed using high stringency conditions as previously 
described (St. Geme. J.W.IIL and S. Falkow. 1 99 1 . Loss of capsule expression by 
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human cells. Infect. Immun. 59:1325-1333.). 
Microscopy 

S. 1990. HaemopMuslnflue******™* mi — c » i,betal 
cells. Infect, Immun. 5»:403fr4044). 

For ^-s™ i nge 1 ec*«mic^.^^^™* 0J% ^ ,K 
^v.ace^lStOeme.J.W.UI.andS.Falkow.W.Lossofcapsuleexprcss.a, 

K/Z^te^e-^Wbres^iaenb^ed adherence to and invasion 

t i mw „ n S9 13^5-1333.) and examined using a Zeiss 10A 
of human cells. Infect. Immun. w. i->~> i 

microscope. 

The previously indicated** labora.ory£. «tf s«ains harboring the plasmid 
P DC400 were capable of efficient anachmen. to cultured human epithelial cells 
<S, Geme. J.W.IH. and D. Otter. 1995. Evidence that surface fibrils expressed 

bv W« " P«— araChn " :n, 10 hUman eP " heUal ^ 

Mo! Microbiol. .5:77-85 ). Subcloning studies and uansposon mutageneses 

cells Mo>.Microbio..l 5 :77.85.)(Figure7>. To confirm this conclusion, m me 
present studv thisATwl fragment was subcloned into P T7-7, generating plasmtds 

(Figure 7). As predicted, expression of these plasmids in E. coU DH5« was 
associated with a capacity for high level in v/rro anachmen. (Table 1 ) 
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Table 1. Adherence to Chang conjunctival cells. 

Strain ADHERENCE (% inoculumV 
DH5o/pT7-7 0.4 ±0.1 

DH5a/pDC400 25.3 ±12 

5 DH5a/pDC601 54.3 ±7.5 

DH5a/pDC602 55.5 ±4.3 

C54bp" 98.7 ±9.5 

C54-HAl::kan b 1.5 ±0.2 

C54-Tn400.23 c 3.3 ±0.4 

1 0 'Adherence was measured in a 30 minute assay and was calculated by dividing the 
number of adherent bacteria by the number of inoculated bacteria. Values are the 
mean ± SEM of measurements made in triplicate from representative experiments 
b Strain C54-HAl::kan was constructed by transforming C54b-p with linearized 
pHMW8-6. which contains the HA 1 gene with an intragenic kanamycin cassette. 

1 5 c Strain C54-Tn400.23 contains a mini-Tn70 kan element in the As/locus (St. Geme 
et aL Mol. Microbiol. 15:77-85 (1995)). 

To determine the direction of transcription and identify plasmid-encoded proteins, 
pDC601 and pDC602 were subsequently introduced into £ coli BL21(DE3). 
producing BL21(DE3)/pDC601 and BL21(DE3)/pDC602, respectively. As a 

20 negative control, pT7-7 was also transformed into BL2 1 (DE3). The T7 promoter 
in these three strains was induced with IPTG. and induced proteins were detected 
using trans-pSl-label. As shown in Figure 8, induction of BL21(DE3)/pDC601 
resulted in expression of a large protein over 200 kDa in size along with several 
slightly smaller proteins, which presumably represent degradation products. In 

25 contrast, when BL2 1 (DE3)/pDC602 and BL2 1 (DE3 )/pT7-7 were induced, there 
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was no expression of these proteins. THis experiment indicated that the geneuc 
material contained in the 8.3 kb Xbal fragment is transcribed from left to nght as 
shown in Figure 7 and suggested that a single long open reading frame may be 

present. 

5 Nucleotide sequencing 

' Nucleotidesequence was determined using a Sequenase kit and double-stranded 
p,asmidtemplate. Defragments were subcloned into pUC 19 and sequenced along 
both strandsby primer walking. DNA sequence analysis was performed using the 
Genetics Computer Group (GCG) software package from the University of 

10 Wisconsin (Devereux. J.. P. Haeberli. and O. Smithies. 1984. A comprehend 
set of sequence analysisprogramsforthe VAX. Nucleic Acids Res. 12:387-395.> 
Sequence similarity searches were carried out using the BLAST program of the 
National Center for Biotechnology Information (Ahschul, S.F.. W. Gish. W. Miller. 
E.W. Myers, and D J. Lipmaa 1990. Basis local alignment search tool. J. Mol. Biol. 

15 215:403-410.). 

Sequencing of the 8.3 kb Xbal fragment revealed a 7059 bp gene, which is 
designated for literature purposes as hsf for Hae6mophilu S surface fibrils, and ,s 
referred to herein as HA2. This gene encodes a 2353-amino acid polypeptide. 
referredtoasHsfor HA2. with a calculated molecular mass of 243.8 kDa. which 
is similar in size to the observed protein species detected after induction of 
BL->l(DE3y P DC601. The HA2 gene has a GC content of 42.8%. somewhat greater 
than the published estimate of 38-39% for the whole genome (Fleischmannetal.. 
,995 Whole-genomerandomsequenc^ 

Rd Science 269: 496-512.. Kilian. M. 1976. A taxonomic study of the genus 
Haemophilus, with proposal of a new specie, J. Gen. Microbiol. 93:9-62.). A 
putative ribosomal binding site with the sequence AAGGTA begins 1 3 base pairs 
upstream of the presumed initiation codon. A sequence similar to a r/jo-independent 



20 
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transcription terminator is present beginning 20 nucleotides beyond the stop codon 
and contains interrupted inverted repeats with the potential for forming a hairpin 
structure containing a loop of two bases and a stem of 1 1 bases. Of note, a string 
of 29 thymines spans the region from 149 to 121 nucleotides upstream oiHA2. 

5 Homology to HAlfHAl 

The nontypable H. influenzae nonpilus protein HA1 protein (called Hia in the 
literature) promotes attachment to cultured human epithelial cells as outlined above. 
Comparison of the predicted amino acid sequence of HA2 and the sequence of HA1 
revealed 8 1 % similarity and 72% identity overall. As depicted in Figure 5, the two 

1 0 sequences are highly conserved at their N-terminal and C -terminal ends, and both 
contain a Walker box nucleotide-bindingmotif. Interestingly. HA 1 is encoded by 
a 3.2 kb gene and is only 1 15-kDa. In this context it is noteworthy that three 
separate stretches of HA2 (corresponding to amino acids 174 to 608. 847 to 129U 
and 1476 to 1914, respectively) show significant homology to the region of HA1 

15 defined by amino acids 221 to 658 (Figure 5). Table 2 summarizes the level of 
similarity and identity between these three stretches of H A2 and one another. The 
suggestion is that the larger size of HA2 may relate in part to the presence of a 
repeated domain which is present in single copy in HA1 . 

Table 2. Percent similarity and percent identity between HA2 repeats. 

20 Percent Similarity/Percent Identity 

HA2 174-608* HA2 847-1291* HA2147M914 1 

HA2 174-608 * 65/53 76/60 

HA2 847-1291 * 70/56 

HA2 1476-1914 * 

25 - 'Numbers correspond to amino acid residue positions in the full-length HA2 (Hsf) 
protein. 
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To evaluate whether HA 1 and HA2 are alleles of the same locus, a series of Southern 
blots were performed. Samples of chromosomal DN A from strains C54 and U were 
subjected to digestion with Bg/11, CM and either Ps,\ or Xbal. Resulting DN A 
fragments were separatedby agarose electrophoresisand transferred bidirectionally 
to nitrocellulose membranes. One membrane was probed with a 3.3 kb internal 
fragment of the HA2 gene (Figure 7), and the other membrane was probed with a 
1.6 kb intragenic fragment of the HA1 gene. As shown in Figure 9. both probes 
recognized exactly the same chromosomal fragments. 

To obtain additional evidence that the HA2 and HA1 genes are homologs. the 
inactivation of HA2 by transformation of H. influenzae strain C54bp with 
insertionallyinactivated^/ was attempted. Tl,eplasrnid P HMW8-6(Barenkamp. 
S.J..andJ.W.St.GemeJn.Id^^ 

adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbk>L 
in press.), which contains the HA 1 gene with an intragenic kanamycin cassette, was 
linearized with Mel and introduced into competent C54. Southern hybridization 
connrmedir^ernonofthekanamycincassetteinto^-Mnotshown). Furthermore, 
exammation of the C54 mutant by negative staining transmission electron microscopy 
revealed the loss of surface fibrils (not shown). Consistent with these findings, the 
rnutantstraindemonstratedminimalattachmentto Chang conjunctival cells (Table 



20 1). 



in additional experiments, the cellular binding specificities conferred by the HA2 
and HA1 proteins were compared. As shown in Figure 10. DH5a/pDC601 
(expressing HA2) demonstrated high level attachment to Chang cells, KB cells. HeLa 
cells andIntestine407cells.moderatelevelattachmenttoHEp-2cell,andminimal 

attachmenttoHEC-IB cell, ME-180 cell, and CHO-K1 cell, DH5aharbonng 
pHMWS-5 (expressing HAD showed virtually the same pattern of attachment. 
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Giemsa staining and subsequent examination by light microscopy confirmed these 
viable count adherence assay results. 

Homology to other bacterial extracellular proteins 

A protein sequence similarity search was performed with the HA2 predicted amino 
5 . acid sequence using the BLAST network service of the National Center for 
Biotechnology Information (AltschuK S.F.. W. Gish, W. Miller. E.W. Myers, and 
D.J. Lipman. 1990. Basis local alignment search tool. J. Mol. Biol. 215:403-410.). 
This search revealed low-level sequence similarity to a series of other bacterial 
adherence factors, including HMW1 and HMW2 (the proteins previously identified 

10 as being important adhesins in HA 1 -deficient nontypable K influenzae strains; (St. 
Geme. J. Will. S. Falkow. and S J. Barenkamp. 1 993. High-molecular-weight proteins 
of nontypable Haemophilus influenzae mediate attachments human epithelial cells. 
Proc. Natl. Acad. Sci. U.S.A. 90:2875-2879.). AIDA-I (an adhesion protein expressed 
by some diarrheagenic £. coli strains: Benz, I., and M.A. Schmidt. 1992. AIDA-I. 

1 5 the adhesin involved in diffuse adherence of the diarrhoeagenic Escherichia coli strain 
2787 (0126:H27), is synthesized via a precursor molecule. Mol. Microbiol. 
6: 1539-1 546.).and Tsh (a hemagglutinin produced by an avian pathogenic E. coli 
strain: Provence. D. and R. CurtissIII. 1994. Isolation and characterizationof a gene 
involved in hemagglutination by an avian pathogenic Escherichia coli strain. Infect 

20 Immun. 62: 1369- 1 380.). In addition. HA2 showed homology to SepA. a Shigella 
flexneri secreted protein that appears to play a role in tissue invasion 
(Benjelloun-Touimi. Z.. P.J. SansonettL and C. Parsot. 1995. SepA. the major 
extracellular protein of Shigella flexnert autonomous secretion and involvement in 
tissue invasion. Mol. Microbiol. 17:123-135.). Alignment of HA2 with HMWK 

25 HMW2. AIDA-I. Tsh. and SepA revealed a highly conserved N-terminal domain 
(Figurel 1). InAIDA-I.Tsh.andSepA.thisN-tei7ninalextremit\precedesatypical 
procary otic signal sequence (Benjelloun-Touimi.Z.. P.J. SansonettL and C. Parsot. 
1 995. SepA. the major extracellular protein of Shigella flexneri: autonomous secretion 
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md Mo.. Mioobiol. 17:123-135 * 

alanine-glutamine-alanine. 

m M42 ^ Eagan (S, Geme. ,W,, - ™ 

tiles * Soufcen. -** - — - — « 

longing , phvlogenic division 1 - *- » **— *— " 

Ucs of «caps„l*ed s»ains of *~ ^ 

Sci USA.K77W.77e.). Ch— a! W*- 
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Jpe „ raised * o.uesUo„ of *. »f *is ,0c. in oine, non-,M* b 

isola.es again demonsuaed a homolog in all cases (Figu* 13). 
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the C54 HA2 gene, and the Rd derived amino acid sequence is 62% identical and 
75% similar to C54 HA2. Interestingly, the Rd open reading frame appears to be 
truncated due to a "premature" stop codon. 

Previous experiments revealed that 13 of 15 nontypable strains lacking an 
5 HM W 1 /HMW2-relatedprotein had evidence of an HA 1 homolog (Barenkamp, S.J., 
and J.W. St. Geme, HI. Identification of a second family of high molecular weight 
adhesion proteins expressed by nontypable Haemophilus influenzae. Mol. Microbiol., 
in press.). Consistent with the demonstration that HA2 and HA1 are homologous. 
Southern analysis of these 15 strains, probing with the 3.3 kb fragment of hsf* 
10 demonstrated hybridization in 12 of the same 13 (not shown). 

Chromosomal location of the HA2 locus 

In earlier work, the HA1 locus in nontypable strain 1 1 was found to be flanked 
upstream by an open reading frame with significant homology to E. coli 
exoribonuclease II (Barenkamp. S.J.. and J.W. St. Geme. III. Identification of a second 

15 family of high molecular weight adhesion proteins expressed by nontypable 
Haemophilus influenzae. Mol. Microbiol., in press.). Similarly, the HA2 locus in 
strain C54 likewise is flanked on the 5* side by an open reading frame with similarity 
to £. coli exonucleasell. This gene terminates 357 base pairs before the HA2 start 
codon and encodes a protein with a predicted amino acid sequence that is 61% similar 

20 and 33% identical at its C-terminal end to exoribonuclease II . Of note, the Rd HA2 
homolog is also flanked upstream by the exoribonuclease II locus. 

EXAMPLE 3 
Cloning of HA3 

Recombinant phage containing the nontypable Haemophilus strain 32 HA3 gene were 
25 isolated and characterized using methods modified slightly from those described 
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previously (Barenkampand St. Geme. Molecular Microbiology 1996, in press). In 
bnef. chromosomal DNA from strain 32 was prepared by a modification of the method 
of Marmur (Marmur. 1961). Sau3A partial restriction digests of the DNA were 
prepared fractionated on 0.7% agarose gels. Fractions containing DNA fragments 
in the 9- to 20- kbp range were pooled, and a library was prepared by ligation into 
A.EMBL3 arms. Ligation mixtures were packaged in vitro with Gigapack® 
(Stratagene, La Jolla. CA) and plate amplified in a P2 lysogen of E. coli LE392. 

Lambda plaque screening was performed using a mixture of three PCR products 
denved from stram 32 chromosomal DNA. These PCR products were amplified using 
primer pairs previously shown to amplify DNA segments at the 5' end of the strain 
1 1 HAl gene. The primers were as follows: 
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Primfflr Hp^iy nation 


sirand. 


sequence 


44P 


positive 


CCG TGC TTG CCC AAC ACG CTT 


64P 


positive 


GCT GCC ACC TTG CAC AAC AAC 


93G-2 


positive 


CTT TCA ATG CCA GAA AGT AGG 


18T-1 


negative 


CTT CAA CCG TTG CGG ACA ACA 



25 



Each of the positive strand primers was used with the single negative strand primer 
to generate the three fragments used for probing the library. 

The PCR products generated from strain 1 1 and strain 32 chromosomal DNA were 
identical in size, suggesing that the nucleotide sequences of these chromosomal 
regions were similar in the two strains. Plaque screening was performed using 
standard methodology (Berger and Kimmel. 1987) at high stringency: final wash 
condhionswere 65C for 1 hour in buffer containing 2XSSC and 1% SDS. Positive 
plaques were identified by autoradiography, plaque purified and phage DNA was 
purifiedbv standardmethods. The same primerpairs used to generatethe screening 
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probes were then used to localize the HA3 gene by amplifying various restriction 
fragments derived from the phage DNA. Once localized, the strain 32 HA3 gene 
and flanking DNA were sequenced using standard methods. 

In order to construct strain 32 isogenic Haemophilus influenzae mutants deficient 
5 in expression of the HA3 gene, bacteria were made competent using the MI V (Herriott 
et al. 1 970) and were transformed with linearized pHMW8-6, selecting for kanamycin 
resistance. Allelic exchange was confirmed by Southern analysis. The mutants that 
no longer expressed HA3 exhibited a marked decrease in binding to Chang epithelial 
cells, using the methods outlined above (data not shown). 

1 0 Expression in non-adherent strains of £. coli did not result in adherence, although 
it has not been confirmed that the protein was actually expressed. 
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(B) STREET: Four Embarcadero Center, 

(C) CITY: San Francisco 

(D) STATE: California 

(E> COUNTRY: United States 
(F) ZIP: 94111-4187 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(D) SOFTWARE: Patentln Release *j. 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: UNKNOWN 

(B) FILING DATE: 22-MAR-1996 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: US 08/409,995 

(B) FILING DATE: 24 -MAR- 1995 

(viii) ATTORNEY / AGENT INFORMATION: 
(A) NAME: Silva. Robin M. 

(B) TELEFAX: (415) 398-3249 

(C) TELEX: 910 277299 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3294 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

<xi> SEQUENCE DESCRIPTION : SEQ ID NO-.l: 
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ATGAACAAAA TTTTTAACGT TATTTGGAAT GTTGTGACTC AAACTTGGGT TGTCGTATCT €0 

GAACTCACTC GCACCCACAC CAAATGCGCC TCCGCCACCG TGGCGGTTGC CGTATTGGCA 120 

ACCCTGTTGT CCGCAACGGT TGAGGCGAAC AACAATACTC CTGTTACGAA TAAGTTGAAG 180 

GCTTATGGCG ATGCGAATTT TAATTTCACT AATAATTCGA TAGCAGATGC AGAAAAACAA 240 

GTTCAAGAGG CTTATAAAGG TTTATTAAAT CTAAATGAAA AAAATGCGAG TGATAAACTG 300 

TTGGTGGAGG ACAATACTGC GGCGACCGTA GGCAATTTGC GTAAATTGGG CTGGGTATTG 360 

TCTAGCAAAA ACGGCACAAG GAACGAGAAA AGCCAACAAG TCAAACATGC GGATGAAGTG 420 

TTGTTTGAAG GCAAAGGCGG TGTGCAGGTT ACTTCCACCT CTGAAAACGG CAAACACACC 480 

ATTACCTTTG CTTTAGCGAA AGACCTTGGT GTGAAAACTG CGACTGTGAG TGATACCTTA 540 

ACGATTGGCG GTGGTGCTGC TGCAGGTGCT ACAACAACAC CGAAAGTGAA TGTAACTAGT 600 

ACAACTGATG GCTTGAAGTT CGCTAAAGAT GCTGCGGGTG CTAATGGCGA TACTACGGTT 660 

CACTTGAATG GTATTGGTTC AACCTTGACA GACACGCTTG TGGGTTCTCC TGCTACTCAT 720 

ATTGACGGAG GAGATCAAAG TACGCATTAC ACTCGTGCAG CAAGTATCAA GGATGTCTTG 780 

AATGCGGGTT GGAATATCAA GGGTGTTAAA GCTGGCTCAA CAACTGGTCA ATCAGAAAAT 84 0 

GTCGATTTTG TTCATACTTA CGATACTGTT GAGTTCTTGA GTGCGGATAC AGAGACCACG 900 

ACTGTTACTG TAGATAGCAA AGAAAACGGT AAGAGAACCG AAGTTAAAAT CGGTGCGAAG 960 

ACTTCTGTTA TCAAAGAAAA AGACGGTAAG TTATTTACTG GAAAAGCTAA CAAAGAGACA 102 0 

AATAAAGTTG ATGGTGCTAA CGCGACTGAA GATGCAGACG AAGGCAAAGG CTTAGTGACT 1080 

GCGAAAGATG TGATTGACGC AGTGAATAAG ACTGGTTGGA GAATTAAAAC AACCGATGCT 114 0 

AATGGTCAAA ATGGCGACTT CGCAACTGTT GCATCAGGCA CAAATGTAAC CTTTGCTAGT 1200 

GGTAATGGTA CAACTGCGAC TGTAACTAAT GGCACCGATG GTATTACCGT TAAGTATGAT 1260 

GCGAAAGTTG GCGACGGCTT AAAACTAGAT GGCGATAAAA TCGCTGCAGA TACGACCGCA 1320 

CTTACTGTGA ATGATGGTAA GAACGCTAAT AATCCGAAAG GTAAAGTGGC TGATGTTGCT 1380 

TCAACTGACG AGAAGAAATT GGTTACAGCA AAAGGTTTAG TAACAGCCTT AAACAGTCTA 144 0 

AGCTGGACTA CAACTGCTGC TGAGGCGGAC GGTGGTACGC TTGATGGAAA TGCAAGTGAG 1500 

CAAGAAGTTA AAGCGGGCGA TAAAGTAACC TTTAAAGCAG GCAAGAACTT AAAAGTGAAA 1560 

CAAGAGGGTG CGAACTTTAC TTATTCACTG CAAGATGCTT TAACAGGCTT AACGAGCATT 1620 

ACTTTAGGTA CAGGAAATAA TGGTGCGAAA ACTGAAATCA ACAAAGACGG CTTAACCATC 1680 
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ACACCAGCAA ATGGTGCGGG TGCAAATAAT GCAAACACCA TCAGCGTAAC CAAAGACGGC 
ATTAGTGCGG GCGGTCAGTC GGTTAAAAAC GTTGTGAGCG GACTGAAGAA ATTTGGTGAT 
GCGAATTTCG ATCCGCTGAC TAGCTCCGCC GACAACTTAA CGAAACAAAA TGACGATGCC 
TATAAAGGCT TGACCAATTT GGATGAAAAA GGTACAGACA AGCAAACTCC AGTTGTTGCC 
GACAATACCG CCGCAACCGT GGGCGATTTG CGCGGCTTGG GCTGGGTCAT TTCTGCGGAC 
AAAACCACAG GCGGCTCAAC GGAATATCAC GATCAAGTTC GGAATGCGAA CGAAGTGAAA 
TTCAAAAGCG GCAACGGTAT CAATGTTTCC GGTAAAACGG TCAACGGTAG GCGTGAAATT 
ACTTTTGAAT TGGCTAAAGG TGAAGTGGTT AAATCGAATG AATTTACCGT CAAAGAAACC 
AATGGAAAGG AAACGAGCCT GGTTAAAGTT GGCGATAAAT ATTACAGCAA AGAGGATATT 
GACTTAACAA CAGGTCAGCC TAAATTAAAA GATGGCAATA CAGTTGCTGC GAAATATCAA 
GATAAAGGTG GCAAAGTCGT TTCTGTAACG GATAATACTG AAGCTACCAT AACCAACAAA 
GGTTCTGGCT ATGTAACAGG TAACCAAGTG GCAGATGCGA TTGCGAAATC AGGCTTTGAG 
CTTGGCTTGG CTGATGAAGC TGATGCGAAA CGGGCGTTTG ATGATAAGAC AAAAGCCTTA 
TCTGCTGGTA CAACGGAAAT TGTAAATGCC CACGATAAAG TCCGTTTTGC TAATGGTTTA 
AATACCAAAG TGAGCGCGGC AACGGTGGAA AGCACCGATG CAAACGGCGA TAAAGTGACC 
ACAACCTTTG TGAAAACCGA TGTGGAATTG CCTTTAACGC AAATCTACAA TACCGATGCA 
AACGGTAAGA AAATCACTAA AGTTGTCAAA GATGGGCAAA CTAAATGGTA TGAACTGAAT 
GCTGACGGTA CGGCTGATAT GACCAAAGAA GTTACCCTCG GTAACGTGGA TTCAGACGGC 
AAGAAAGTTG TGAAAGACAA CGATGGCAAG TGGTATCACG CCAAAGCTGA CGGTACTGCG 
GATAAAACCA AAGGCGAAGT GAGCAATGAT AAAGTTTCTA CCGATGAAAA ACACGTTGTC 
AGCCTTGATC CAAATGATCA AT C AAAAGGT AAAGGTGTCG TGATTGACAA TGTGGCTAAT 
GGCGATATTT CTGCCACTTC CACCGATGCG ATTAACGGAA GTCAGTTGTA TGCTGTGGCA 
AAAGGGGTAA CAAACCTTGC TGGACAAGTG AATAATCTTG AGGGCAAAGT GAATAAAGTG 
GGCAAACGTG CAGATGCAGG TACAGCAAGT GCATTAGCGG CTTCACAGTT ACCACAAGCC 
ACTATGCCAG GTAAATCAAT GGTTGCTATT GCGGGAAGTA GTTATCAAGG TCAAAATGGT 
TTAGCTATCG GGGTATCAAG AATTTCCGAT AATGGCAAAG TGATTATTCG CTTGTCAGGC 
ACAACCAATA GTCAAGGTAA AACAGGCGTT GCAGCAGGTG TTGGTTACCA GTGG 



X740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2280 
2340 
2400 
2460 
2520 
2580 
2640 
2700 
2760 
2820 
2880 
2940 
, 3000 
3060 
3120 
3180 
3240 
3294 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1098 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

* Met Asn Lys lie Phe Asn Val He Trp Asn Val Val Thr Gin Thr Trp 

1 5 .10 15 

Val Val Val Ser Glu Leu Thr Arg Thr His Thr Lys Cys Ala Ser Ala 
20 25 30 

Thr Val Ala Val Ala Val Leu Ala Thr Leu Leu Ser Ala Thr Val Glu 
35 40 45 

Ala Asn Asn Asn Thr Pro Val Thr Asn Lys Leu Lys Ala Tyr Gly Asp 
50 55 60 

Ala Asn Phe Asn Phe Thr Asn Asn Ser He Ala Asp Ala Glu Lys Gin 
65 70 75 80 

Val Gin Glu Ala Tyr Lys Gly Leu Leu Asn Leu Asn Glu Lys Asn Ala 
85 90 95 

Ser Asp Lys Leu Leu Val Glu Asp Asn Thr Ala Ala Thr Val Gly Asn 
100 105 11° 

Leu Arg Lys Leu Gly Trp Val Leu Ser Ser Lys Asn Gly Thr Arg Asn 
115 120 I 25 

Glu Lys Ser Gin Gin Val Lys His Ala Asp Glu Val Leu Phe Glu Gly 
130 135 140 

Lvs Gly Gly Val Gin Val Thr Ser Thr Ser Glu Asn Gly Lys His Thr 

-icn 155 1^0 

145 150 A3: > 

lie Thr Phe Ala Leu Ala Lys Asp Leu Gly Val Lys Thr Ala Thr Val 
165 1 ? 0 175 

Ser Asp Thr Leu Thr lie Gly Gly Gly Ala Ala Ala Gly Ala Thr Thr 
1B0 185 190 

Thr Pro Lys Val Asn Val Thr Ser Thr Thr Asp Gly Leu Lys Phe Ala 
195 200 205 

Lys Asp Ala Ala Gly Ala Asn Gly Asp Thr Thr Val His Leu Asn Gly 
210 215 220 

lie Gly Ser Thr Leu Thr Asp Thr Leu Val Gly Ser Pro Ala Thr His 
225 230 235 240 
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lie Asp Gly Gly Asp Gin Ser Thr His Tyr Thr Arg Ala Ala Ser He 

245 250 

Lys Asp val Leu Asn Ala Gly Trp Asn He Lys Gly Val Lys Ala Gly 

260 265 

Ser Thr Thr Gly Gin Ser Glu Asn Val Asp Phe Val His Thr Tyr Asp 
275 280 285 

Thr Val Glu Phe Leu Ser Ala Asp Thr Glu Thr Thr Thr Val Thr Val 



290 



295 



Asp ser Lys Glu Asn Gly Lys Arg Thr Glu Val Lys lie Gly Ala Lys 



305 



310 



Thr Ser Val He Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly £. Ala 

325 330 



Asn Lys Glu Thr Asn Lys Val Asp Gly Ala Asn Ala Thr Glu Asp Ala 

340 345 

Asp Glu Gly Lys Gly Leu Val Thr Ala Lys Asp Val lie Asp Ala Val 
355 360 3G5 

Asn Lys Thr Gly Trp Arg He Lys Thr Thr Asp Ala Asn Gly Gin Asn 

370 375 380 

G ly Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Ala Ser 
385 39° 395 

Gly Asn Gly Thr Thr Ala Thr Val Thr Asn Gly Thr Asp Gly lie Thr 

405 410 
val Lvs Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Leu Asp Gly Asp 

420 425 
Lys lie Ala Ala Asp Thr Thr Ala Leu Thr Val Asn Asp Gly Lys Asn 
435 440 445 

Lys Gly Lys Val Ala Asp Val Ala Ser Thr Asp Glu 



Ala Asn Asn Pro 



455 



450 

T Va i Thr Ala Lvs Gly Leu Val Thr Ala Leu Asn Ser Leu 
Lvs Lys Leu Val Thr AJ.a uyi> 480 

465 470 4 

Ser Trp Thr Thr Thr Ala Ala Glu Ala Asp Gly Gly Thr Leu Asp Gly 
485 490 

Asn Ala Ser Glu Gin Glu Val Lys Ala Gly As P Lys Val Thr Phe Lys 

500 505 

U . Gly £S AS» «. W « £ •>» «- "* £ ^ * 

s„ Le„ «. «.p «. «- — .«» — ~ ser "* L " ° ly "* 



460 



535 540 
530 535 
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Gly Asn Asn Gly Ala Lys Thr Glu lie Asn Lys Asp Gly Leu Thr He 
545 550 555 560 

Thr Pro Ala Asn Gly Ala Gly Ala Asn Asn Ala Asn Thr He Ser Val 
565 570 575 

Thr Lys Asp Gly He Ser Ala Gly Gly Gin Ser Val Lys Asn Val Val 
580 585 590 

Ser Gly Leu Lys Lys Phe Gly Asp Ala Asn Phe Asp Pro Leu Thr Ser 
595 600 605 

Ser Ala Asp Asn Leu Thr Lys Gin Asn Asp Asp Ala Tyr Lys Gly Leu 
610 615 620 

Thr Asn Leu Asp Glu Lys Gly Thr Asp Lys Gin Thr Pro Val Val Ala 
625 630 635 640 

Asp Asn Thr Ala Ala Thr Val Gly Asp Leu Arg Gly Leu Gly Trp Val 
645 650 655 

He Ser Ala Asp Lys Thr Thr Gly Gly Ser Thr Glu Tyr His Asp Gin 
660 665 670 

Val Arg Asn Ala Asn Glu Val Lys Phe Lys Ser Gly Asn Gly He Asn 
675 680 685 

Val Ser Gly Lys Thr Val Asn Gly Arg Arg Glu He Thr Phe Glu Leu 
690 695 700 

Ala Lys Gly Glu Val Val Lys Ser Asn Glu Phe Thr Val Lys Glu Thr 
705 710 715 720 

Asn Gly Lys Glu Thr Ser Leu Val Lys Val Gly Asp Lys Tyr Tyr Ser 
725 730 735 

Lys Glu Asp He Asp Leu Thr Thr Gly Gin Pro Lys Leu Lys Asp Gly 
740 745 750 

Asn Thr Val Ala Ala Lys Tyr Gin Asp Lys Gly Gly Lys Val Val Ser 
755 760 765 

Val Thr Asp Asn Thr Glu Ala Thr He Thr Asn Lys Gly Ser Gly Tyr 
770 775 780 

Val Thr Gly Asn Gin Val Ala Asp Ala He Ala Lys Ser Gly Phe Glu 
785 790 795 800 

Leu Gly Leu Ala Asp Glu Ala Asp Ala Lys Arg Ala Phe Asp Asp Lys 
805 810 815 

Thr Lys Ala Leu Ser Ala Gly Thr Thr Glu He Val Asn Ala His Asp 
820 825 830 

Lys Val Arg Phe Ala Asn Gly Leu Asn Thr Lys Val Ser Ala Ala Thr 
835 840 B45 
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val Olu Ser Thr Asp Ala Asn Gly Asp Lys Val Thr Thr Thr Phe Val 

850 Bbb 



Uys Thr Asp Val Glu Leu Pro Leu Thr Gin lie Tyr Asn Thr Asp Ala 
865 870 875 

*sn Gly Lys Lys He Thr Lys Val Val Lys Asp Gly Gin Thr Lys Trp 

885 890 

Tyr Glu Leu Asn Ala Asp Gly Thr Ala Asp Met Thr Lys Glu Val Thr 

900 905 
L eu Gly Asn Val Asp Ser Asp Gly Lys Lys Val Val lys Asp Asn Asp 

915 920 
Oly Lys Trp Tyr His Ala Lys Ala Asp Gly Thr Ala Asp Lys Thr Lys 



930 



935 



Gly Glu val Ser Asn Asp Lys Val Ser Thr Asp Glu Lys His Val Val 



945 9 50 



ser Leu Asp Pro Asn Asp Gin Ser Lys Gly Lys Gly Val Val II. Asp 



965 



970 



Asn Val Ala Asn 



Gly Asp lie Ser Ala Thr Ser Thr Asp Ala He Asn 

1 r 990 



980 985 



Gly Ser Gin Leu Tyr Ala Val Ala Lys Gly Val Thr As^Leu Ala Gly 

995 1000 

Gl„ Val Asn Asn Leu Glu Gly Lys Val Asn Lys Va^Gly Lys Ar 3 Ala 

1010 1015 
Asp Ala Gly Thr Ala Ser Ala Leu Ala Ala Ser Gin Leu Pro Gin Al^ 



X025 

Thr Met Pro Gly Lys Ser Met Val Ala lie Ala Gly Ser Ser Ty^Gln 

1045 1050 

01, «. » «V — XX •»» V.1 ser «, IX. *> £ » "V 

1060 1065 

,y, val .U tt. U. — « Oly *r « »n se, «. «, !«• «« 

1075 1080 

Gly val Ala Ala Gly Val Gly Tyr Gin Trp 
1090 1095 

(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7291 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY : unknown 
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(ii) MOLECULE TYPE: DNA (genomic) 

(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 163.. 7221 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

TTTNTTTTTC TTATTTTTTT riTlTm A 1 TlTTT T'lU l TTGAGGCTAA ACTTTTNGNA 

AAATATCACT TTTTTATTCT CCAAATATAG AATAGAATAC GCACGATTTC ACTAAGAAAA 

GTATATTTAT CATTAATTTT ATTAAATATA AGGTAAATAA AA ATG AAC AAA ATT 

Met Asn Lys lie 
1 

TTT AAC GTT ATT TGG AAT GTT ATG ACT CAA ACT TGG GTT GTC GTA TCT 222 
Phe Asn Val He Trp Asn Val Met Thr Gin Thr Trp Val Val Val Ser 
5 10 15 20 

GAA CTC ACT CGC ACC CAC ACC AAA CGC GCC TCC GCA ACC GTG GAG ACC 270 
Glu Leu Thr Arg Thr His Thr Lys Arg Ala Ser Ala Thr Val Glu Thr 
25 30 35 

GCC GTA TTG GCG ACA CTG TTG TTT GCA ACG GTT CAG GCG AAT GOT ACC 318 
Ala Val Leu Ala Thr Leu Leu Phe Ala Thr Val Gin Ala Asn Ala Thr 
40 45 50 

GAT GAA GAT GAA GAG TTA GAC CCC GTA GTA CGC ACT GCT CCC GTG TTG 366 
Asp Glu Asp Glu Glu Leu Asp Pro Val Val Arg Thr Ala Pro Val Leu 
55 60 65 

AGC TTC CAT TCC GAT AAA GAA GGC ACG GGA GAA AAA GAA GTT ACA GAA 414 
Ser Phe His Ser Asp Lys Glu Gly Thr Gly Glu Lys Glu Val Thr Glu 
70 75 80 

AAT TCA AAT TGG GGA ATA TAT TTC GAC AAT AAA GGA GTA CTA AAA GCC 462 
Asn Ser Asn Trp Gly He Tyr Phe Asp Asn Lys Gly Val Leu Lys Ala 
85 90 95 100 

GGA GCA ATC ACC CTC AAA GCC GGC GAC AAC CTG AAA ATC AAA CAA AAC 
Gly Ala He Thr Leu Lys Ala. Gly Asp Asn Leu Lys He Lys Gin Asn 
105 HO 115 

ACC GAT GAA AGC ACC AAT GCC AGT AGC TTC ACC TAC TCG CTG AAA AAA 
Thr Asp Glu Ser Thr Asn Ala Ser Ser Phe Thr Tyr Ser Leu Lys Lys 
120 125 130 

GAC CTC ACA GAT CTG ACC AGT GTT GCA ACT GAA AAA TTA TCG TTT GGC 606 
Asp Leu Thr Asp Leu Thr Ser Val Ala Thr Glu Lys Leu Ser Phe Gly 
135 140 145 

GCA AAC GGC GAT AAA GTT GAT ATT ACC AGT GAT GCA AAT GGC TTG AAA 654 
Ala Asn Gly Asp Lys Val Asp He Thr Ser Asp Ala Asn Gly Leu Lys 
150 155 160 



510 



55 
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TTG GCG AAA ACA GGT AAC GGA AAT GTT CAT TTG AAT GGT TTG GAT TCA 
I^S^Thr Gly Asn Gly Asn Val His Leu Asn Gly Leu Asp Ser 
165 170 175 



ACT TTG CCT GAT GCG GTA ACG AAT ACA GGT GTG TTA ACT TCA TCA AGT 
rZ ™ IZ Asp Ala Val Thr Asn Thr Gly Val Leu Ser Ser Ser Ser 

18 5 

TTT ACA CCT AAT GAT GTT GAA AAA ACA AGA GCT GCA ACT GTT AAA GAT 
£ iS IZ Asn Asp val Glu Lys Thr Arc, Ala Ala Thr Val Lys Asp 
200 205 210 

GTT TTA AAT GCA GGT TGG AAC ATT AAA GGT GCT AAA ACT GCT GGA GGT 
Sal Zu £n Ala Gly Trp Asn lie Lys Gly Ala Lys Thr Ala Gly Gly 



215 



220 



AAT GTT GAG AGT GTT GAT TTA GTG TCC GCT TAT AAT AAT GTT GAA TTT 
Asn Val Glu Ser Val Asp Leu Val Ser Ala Tyr Asn Asn Val Glu Phe 
230 235 240 

ATT ACA GGC GAT AAA AAC ACG CTT GAT GTT GTA TTA ACA GCT AAA GAA 
iZ tS Gly Asp Lys Asn Thr Leu Asp Val Val Leu Thr Ala Lys Glu 
245 250 255 

AAC GGT AAA ACA ACC GAA GTG AAA TTC ACA CCG AAA ACC TCT GTT ATC 
Asn Gly Lys Thr Thr Glu Val Lys Phe Thr Pro Lys Thr Ser Val lie 
265 270 275 

AAA GAA AAA GAC GGT AAG TTA TTT ACT GGA AAA GAG AAT AAC GAC ACA 
Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly Lys Glu Asn Asn Asp Thr 

280 285 

AAT AAA GTT ACA AGT AAC ACG GCG ACT GAT AAT ACA GAT GAG GGT AAT 
£n Lys Val Thr Ser Asn Thr Ala Thr Asp Asn Thr Asp Glu Gly Asn 
295 305 

GGC TTA GTC ACT GCA AAA GCT GTG ATT GAT GCT GTG AAC AAG GCT GGT 
Sy ™ vll tZ Ala Lys Ala Val lie Asp Ala Val Asn Lys Ala Gly 
310 315 

w „. rTT juwv ACA A CT A CT GCT AAT GGT CAA AAT GGC GAC TTC GCA 

Thr Ala Asn Gly Gin Asn Gly Asp Phe Ala 
325 "0 » 5 

ACT GTT GCG TCA GGC ACA AAT GTA ACC TTT GAA AGT GGC GAT GGT ACA 
Thr Sal S Ser Gly Thr Asn Val Thr Phe Glu Ser Gly Asp Gly Thr 

345 350 

ACA GCG TCA GTA ACT AAA GAT ACT AAC GGC AAT GGC ATC ACT GTT AAG 
Thr Ala Ser Val Thr Lys Asp Thr Asn Gly Asn Gly lie Thr Val uy 



360 



r*nr r*r nrr TTG AAA TTT GAT AGC GAT AAA AAA 
Tar 1 oar OTG AAA GTT GGC GAC GGC 1 l/v» m#w* „ _ 

™ Z £S iyl val Gly Asp Gly Leu Lys Phe Asp Ser Asp Lys Lys 

375 380 



702 



750 



798 



846 



894 



942 



990 



1038 



1086 



1134 



1182 



1230 



1278 



1326 
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ATC GTT GCA GAT ACG ACC GCA CTT ACT GTG ACA GGT GGT AAG GTA GCT 1374 
lie Val Ala Asp Thr Thr Ala Leu Thr Val Thr Gly Gly Lys Val Ala 
390 395 400 

GAA ATT GCT AAA GAA GAT GAC AAG AAA AAA CTT CTT AAT GCA GGC GAT 1422 
Glu lie Ala Lys Glu Asp Asp Lys Lys Lys Leu Val Asn Ala Gly Asp 
405 410 415 420 

TTG GTA ACA GCT TTA GGT AAT CTA AGT TGG AAA GCA AAA GCT GAG GCT 1470 
Leu Val Thr Ala Leu Gly Asn Leu Ser Trp Lys Ala Lys Ala Glu Ala 
425 430 435 

GAT ACT GAT GGT GCG CTT GAG GGG ATT TCA AAA GAC CAA GAA GTC AAA 1518 
Asp Thr Asp Gly Ala Leu Glu Gly lie Ser Lys Asp Gin Glu Val Lys 
440 445 450 

GCA GGC GAA ACG GTA ACC TTT AAA GCG GGC AAG AAC TTA AAA GTG AAA 1566 
Ala Gly Glu Thr Val Thr Phe Lys Ala Gly Lys Asn Leu Lys Val Lys 
455 460 465 

CAG GAT GGT GCG AAC TTT ACT TAT TCA CTG CAA GAT GCT TTA ACG GGT 1614 
Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gin Asp Ala Leu Thr Gly 
470 475 480 

TTA ACG AGC ATT ACT TTA GGT GGT ACA ACT AAT GGC GGA AAT GAT GCG 1662 
Leu Thr Ser He Thr Leu Gly Gly Thr Thr Asn Gly Gly Asn Asp Ala 
485 490 495 500 

AAA ACC GTC ATC AAC AAA GAC GGT TTA ACC ATC ACG CCA GCA GGT AAT 1710 
Lys Thr Val He Asn Lys Asp Gly Leu Thr He Thr Pro Ala Gly Asn 
505 510 515 

GGC GGT ACG ACA GGT ACA AAC ACC ATC AGC GTA ACC AAA GAT GGC ATT 1758 
Gly Gly Thr Thr Gly Thr Asn Thr He Ser Val Thr Lys Asp Gly He 
520 525 530 

AAA GCA GGT AAT AAA GCT ATT ACT AAT GTT GCG AGT GGT TTA AGA GCT 1806 
Lys Ala Gly Asn Lys Ala He Thr Asn Val Ala Ser Gly Leu Arg Ala 
535 540 545 

TAT GAC GAT GCG AAT TTT GAT GTT TTA AAT AAC TCT GCA ACT GAT TTA 1854 
Tyr Asp Asp Ala Asn Phe Asp Val Leu Asn Asn Ser Ala Thr Asp Leu 
550 555 560 

AAT AGA CAC GTT GAA GAT GCT TAT AAA GGT TTA TTA AAT CTA AAT GAA 1902 
Asn Arg His Val Glu Asp Ala Tyr Lys Gly Leu Leu Asn Leu Asn Glu 
565 570 575 580 

AAA AAT GCA AAT AAA CAA CCG TTG GTG ACT GAC AGC ACG GCG GCG ACT 1950 
Lys Asn Ala Asn Lys Gin Pro Leu" Val Thr Asp Ser Thr Ala Ala Thr 
585 590 595 

GTA GGC GAT TTA CGT AAA TTG GGT TGG GTA GTA TCA ACC AAA AAC GGT 1998 
Val Gly Asp Leu Arg Lys Leu Gly Trp Val Val Ser Thr Lys Asn Gly 
600 605 610 
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= £ = = = = = 5S22S = ™ = 

615 620 
630 635 640 

645 6 50 " 5 

665 670 
^ ™ R rr rrr GGT AAT AAT GGT ACT GCT GTC ACT AAA GGT 

SSSSSSSS- «y * - «x »r ,y «, 

680 685 

GGC TTT GAA ACT GTT AAA ACT GGA GCG ACT GAT GCA GAT CGC GGT AAA 
cty ^ Glu £ val Lys Thr Gly Ala Thr Asp Ala Asp Arg Gly Lys 
695 700 705 

CTA ACT GTA AAA GAT GCT ACT GCT AAT GAC GCT GAT AAG AAA GTC GCA 
S i£ Si Lys Asp Ala Thr Ala Asn Asp Ala Asp Lys Lys Val Ala 

710 715 720 

. . . ___ frr-p gca A CC GCA ATT AAT AGT GCG GCG ACT TTT GTG 

E £ £ S " S S. -a XU «n « »a «. « ». VJ1 

730 '3d 

760 765 

„_ (~tt AAA CGT GAT GGA AAA AAT ATT ACT TTT 

GCA GGT AAA AAC CTG AAA GTT AAA CGT GA ^ ^ ^ ^ 

Ala Gly Lys Asn Leu Lys Val Lys Arg «sp r 

775 780 

„ - aaa ACT GCG AAA GTG AGT GAT ACT 

SSSE-EXSS* - - - - - 

790 795 

„ »** am CCT ACA GGT GGC ACT ACT GCG ACG CCA 

™ S S S S S S S « OXV OXV « ~ - s 
605 •» 

ry- T TTG AAT TTT GCA AAA GAA 

AAA GTG AAT ATT ACT AGC ACG GCT GAT GGT TTG AAT ^ ^ ^ ^ 

Lys val Asn He Thr Ser Thr Aia asp r 
825 830 



2046 



2094 



2142 



2190 



2238 



2286 



2334 



2382 



2430 



247B 



2526 



2574 



2622 
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ACA GCC GAT GCC TCG GGT TCT AAG AAT GTT TAT TTG AAA GGT ATT GCG 2718 
Thr Ala Asp Ala Ser Gly Ser Lys Asn Val Tyr Leu Lys Gly He Ala 
840 845 850 

ACA ACT TTA ACT GAG CCA AGC GCG GGA GCG AAG TCT TCA CAC GTT GAT 2766 
Thr Thr Leu Thr Glu Pro Ser Ala Gly Ala Lys Ser Ser His Val Asp 
855 B60 865 

TTA AAT GTG GAT GCG ACG AAA AAA TCC AAT GCA GCA AGT ATT GAA GAT 2B14 
Leu Asn Val Asp Ala Thr Lys Lys Ser Asn Ala Ala Ser He Glu Asp 
870 875 B80 

GTA TTG CGC GCA GGT TGG AAT ATT GAA GGT AAT GGT AAT AAT GTT GAT 2862 
Val Leu Arg Ala Gly Trp Asn He Gin Gly Asn Gly Asn Asn Val Asp 
885 890 895 900 

TAT GTA GCG ACG TAT GAC ACA GTA AAC TTT ACC GAT GAC AGC ACA GGT 2910 
Tyr Val Ala Thr Tyr Asp Thr Val Asn Phe Thr Asp Asp Ser Thr Gly 
905 910 915 

ACA ACA ACG GTA ACC GTA ACC CAA AAA GCA GAT GGC AAA GGT GCT GAC 2958 
Thr Thr Thr Val Thr Val Thr Gin Lys Ala Asp Gly Lys Gly Ala Asp 
920 925 930 

GTT AAA ATC GGT GCG AAA ACT TCT GTT ATC AAA GAC CAC AAC GGC AAA 3006 
Val Lys He Gly Ala Lys Thr Ser Val He Lys Asp His Asn Gly Lys 
935 940 945 

CTG TTT ACA GGC AAA GAC CTG AAA GAT GCG AAT AAT GGT GCA ACC GTT 3054 
Leu Phe Thr Gly Lys Asp Leu Lys Asp Ala Asn Asn Gly Ala Thr Val 
950 955 960 

AGT GAA GAT GAT GGC AAA GAC ACC GGC ACA GGC TTA GTT ACT GCA AAA 3102 
Ser Glu Asp Asp Gly Lys Asp Thr Gly Thr Gly Leu Val Thr Ala Lys 
965 970 975 980 

ACT GTG ATT GAT GCA GTA AAT AAA AGC GGT TGG AGG GTA ACC GGT GAG 3150 
Thr Val He Asp Ala Val Asn Lys Ser Gly Trp Arg Val Thr Gly Glu 
985 990 995 

GGC GCG ACT GCC GAA ACC GGT GCA ACC GCC GTG AAT GCG GGT AAC GCT 3198 
Gly Ala Thr Ala Glu Thr Gly Ala Thr Ala Val Asn Ala Gly Asn Ala 
1000 1005 1010 

GAA ACC GTT ACA TCA GGC ACG AGC GTG AAC TTC AAA AAC GGC AAT GCG 3246 
Glu Thr Val Thr Ser Gly Thr Ser Val Asn Phe Lys Asn Gly Asn Ala 
1015 1020 1025 

ACC ACA GCG ACC GTA AGC AAA GAT AAT GGC AAC ATC AAT GTC AAA TAC 3294 
Thr Thr Ala Thr Val Ser Lys Asp Asn Gly Asn He Asn Val Lys Tyr 
1030 1035 1040 

GAT GTA AAT GTT GGT GAC GGC TTG AAG ATT GGC GAT GAC AAA AAA ATC 3342 
Asp Val Asn Val Gly Asp Gly Leu Lys He Gly Asp Asp Lys Lys He 
1045 1050 1055 1060 
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*™ ott HCT GTA ACA GGT GGT AAG GTG TCT GTT 

sssssssss « -t «y «. - ~ « 

1065 1070 

1080 X085 

mpn rrr tta AAC AAC CTA AGC TGG ACG GCA AAA GCC 
GAG GGT TTA GCG ACT GCT. TTA 

Glu Gly Leu Ala Thr Ala Leu Asn Asn Leu Ser Trp Thr^Ad y 
1095 1100 

SSS2Z5 = = = "™~ S 

mo 1115 

, K rr- rrr AAA GCA GGC AAG AAC TTA AAA GTG 

AAA GCA GGC GAC AAA GTA ACC TTT AAA GCA «*. 
Lys Ala Gly Asp Lys Val Thr Phe Lys Ala Gly Lys Asn Leu y 
1125 " 3 ° 1135 

CAG TCT GAA AAA GAC TTT ACT TAT TCA CTG CAA GAC ACT TTA ACA 
£n ITr £u £ Asp Phe Thr T^r Ser^Leu Gin Asp Thr Le^Thr 



1145 



5 = 5 = S £ = 5 S£ E = - £ = = 

s s = s = = - = s = s s s = s = 

1175 1180 1185 

1190 1195 

_ Baa rjvc GGC ATT ACT GCG GGT AAT AAA GAA ATT ACC AAT GTT AAG 
ACC AAA GAC GGC ATT ACT ^ ^ y>1 Lys 

Thr Lys Asp Gly He. Ser Ala e±y «s» r 1220 
1205 I 210 1215 

~ WI aan ACC TAT AAA GAT ACT CAA AAC ACT GCA GAT GAA ACA 

Z S 2 ESSE -p «- "»r Tht " a " p "» ThI 

1225 1230 

„~ rTT nivjv AAC GCA AAT GAA GTT GAG 

CAA GAT AAA GAG TTC CAC GCC GCC GTT AAA ^ ^ ^ ^ ^ ^ 

Gin Asp Lys Glu Phe His Ala Aia va y 

1240 1245 

R ._ GCA ACC GTG TCT GCA AAA ACT GAT AAC AAC 

TTC GTG GGT AAA AAC GGT GCA ACC GTG ^ ^ ^ ^ ^ 

Phe Val Gly Lys Asn Gly Ala inr vax 

1255 1260 



3390 



3438 



34B6 



3534 



3582 



3630 



367B 



3726 



3774 



3822 



3870 



3918 



3966 



4014 



1270 1275 
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GGT CTT GAA AAA GAT ACT GAC GGC AAG ATT AAA CTC AAA GTA GAT AAT 
Gly Leu Glu Lys Asp Thr Asp Gly Lys He. Lys Leu Lys Val Asp Asn 
1285 1290 1295 1300 

ACA GAT GGG AAT AAT CTA TTA ACC GTT GAT GCA ACA AAA GGT GCA TCC 
Thr Asp Gly Asn Asn Leu Leu Thr Val Asp Ala Thr Lys Gly Ala Ser 
1305 1310 1315 

GTT GCC AAG GGC GAG TTT AAT GCC GTA ACA ACA GAT GCA ACT ACA GCC 
Val Ala Lys Gly Glu Phe Asn Ala Val Thr Thr Asp Ala Thr Thr Ala 
1320 1325 1330 

CAA GGC ACA AAT GCC AAT GAG CGC GGT AAA GTG GTT GTC AAG GGT TCA 
Gin Gly Thr Asn Ala Asn Glu Arg Gly Lys Val Val Val Lys Gly Ser 
X335 1340 1345 

AAT GGT GCA ACT GCT ACC GAA ACT GAC AAG AAA AAA GTG GCA ACT GTT 
Asn Gly Ala Thr Ala Thr Glu Thr Asp Lys Lys Lys Val Ala Thr Val 
1350 1355 1360 

GGC GAC GTT GCT AAA GCG ATT AAC GAC GCA GCA ACT TTC GTG AAA GTG 
Gly Asp Val Ala Lys Ala He Asn Asp Ala Ala Thr Phe Val Lys Val 
1365 1370 1375 1380 

GAA AAT GAC GAC AGT GCT ACG ATT GAT GAT AGC CCA ACA GAT GAT GGC 
Glu Asn Asp Asp Ser Ala Thr He Asp Asp Ser Pro Thr Asp Asp Gly 
1385 1390 1395 

GCA AAT GAT GCT CTC AAA GCA GGC GAC ACC TTG ACC TTA AAA GCG GGT 
Ala Asn Asp Ala Leu Lys Ala Gly Asp Thr Leu Thr Leu Lys Ala Gly 
1400 1405 1410 

AAA AAC TTA AAA GTT AAA CGT GAT GGT AAA AAT ATT ACT TTT GCC CTT 
Lys Asn Leu Lys Val Lys Arg Asp Gly Lys Asn He Thr Phe Ala Leu 
1415 1420 1425 

GCG AAC GAC CTT AGT GTA AAA AGC GCA ACC GTT AGC GAT AAA TTA TCG 
Ala Asn Asp Leu Ser Val Lys Ser Ala Thr Val Ser Asp Lys Leu Ser 
1430 1435 1440 

CTT GGT ACA AAC GGC AAT AAA GTC AAT ATC ACA AGC GAC ACC AAA GGC 
Leu Gly Thr Asn Gly Asn Lys Val Asn He Thr Ser Asp Thr Lys Gly 
1445 1450 1455 1460 

TTG AAC TTC GCT AAA GAT AGT AAG ACA GGC GAT GAT GCT AAT ATT CAC 
Leu Asn Phe Ala Lys Asp Ser Lys Thr Gly Asp Asp Ala Asn He His 
1465 1470 1475 

TTA AAT GGC ATT GCT TCA ACT TTA ACT GAT ACA TTG TTA AAT AGT GGT 
Leu Asn Gly He Ala Ser Thr Leu Thr Asp Thr Leu Leu Asn Ser Gly 
1480 1485 1490 

GCG ACA ACC AAT TTA GGT GGT AAT GGT ATT ACT GAT AAC GAG AAA AAA 
Ala Thr Thr Asn Leu Gly Gly Asn Gly He Thr Asp Asn Glu Lys Lys 
1495 1500. I 505 



4062 



4110 



4158 



4206 



4254 



4302 



4350 



4398 



4446 



4494 



4542 



4590 



4638 



4686 
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CGC GCG GCG AGC GTT AAA GAT GTC TTG AAT GCG GGT TGG AAT GTT CGT 
Ala Ala Ser Val Lys Asp Val Leu Asn Ala Gly Trp Asn Val Arg 
1510 "IS 1520 

GGT GTT AAA CCG GCA TCT GCA AAT AAT CAA GTG GAG AAT ATC GAC TTT 
S Val Lys Pro Ala Ser Ala Asn Asn Gin Val Glu Asn lie Asp ft. 
1525 1530 I 535 

GTA GCA ACC TAC GAC ACA GTG GAC TTT GTT ACT GGA GAT AAA GAC ACC 
S Ala Thr Tyr Asp Thr Val Asp Phe Val Ser Gly Asp Lys Asp Thr 
1545 I 550 1555 

ACG ACT GTA ACT GTT GAA ACT AAA GAT AAT GGC AAG AGA ACC GAA GTT 
? nr ser S tS Val Glu Ser Lys Asp Asn Gly Lys Arg Thr Glu Val 

1 15/1! 

1560 1565 



AAA ATC GGT GCG AAG ACT TCT GTT ATC AAA GAC CAC AAC GGC AAA CTG 
£e Gly Ala Lys Thr Ser Val He Lys Asp His Asn Gly Lys Leu 
1575 1580 1585 

TTT ACA GGC AAA GAG CTG AAG GAT GCT AAC AAT AAT GGC GTA ACT GTT 
Phe Thr Gly Lys Glu Leu Lys Asp Ala Asn Asn Asn Gly Val Thr Val 
1590 1595 1600 

ACC GAA ACC GAC GGC AAA GAC GAG GGT AAT GGT TTA GTG ACT GCA AAA 
T £ Tu S Asp Gly Lys Asp Glu Gly Asn Gly Leu val Thr Ala Ly. 
1605 161° 1615 

GCT GTG ATT GAT GCC GTG AAT AAG GCT GGT TGG AGA GTT AAA ACA ACA 
S Va! S Asp Ala val Asn Lys Ala Gly Trp Arg Val Lys Thr Thr 
1625 1«° 1635 

GGT GCT AAT GGT CAG AAT GAT GAC TTC GCA ACT GTT GCG TCA GGC ACA 
Gly Ala Asn Gly Gin Asn Asp Asp Phe Ala Thr Val Ala Ser^Gly Thr 
1640 1645 



AAT GTA ACC TTT GCT GAT GGT AAT GGC 
Asn val Thr Phe Ala Asp Gly Asn Gly 
1655 1660 

GCA AAC GAC GGT AGT ATT ACT GTT AAA 
Ala Asn Asp Gly Ser He Thr Val Lys 
1670 1675 

GGC TTA AAA CTA GAC GGC GAT AAA ATC 
Gly Leu Lys Leu Asp Gly Asp Lys He 
1685 1690 

ACT GTG GCA GAT GGT AAA GTT ACA GCT 
Thr Val Ala Asp Gly Lys Val Thr Ala 
1705 



ACA ACT GCC GAA GTA ACT AAA 
Thr Thr Ala Glu Val Thr Lys 
1665 

TAC AAT GTT AAA GTG GCT GAT 
Tyr Asn Val Lys Val Ala Asp 
1660 

GTT GCA GAC ACG ACC GTA CTT 
Val Ala Asp Thr Thr Val Leu 
1695 I 700 

CCG AAT AAT GGC GAT GGT AAG 
Pro Asn Asn Gly Asp Gly Lys 
1710 I 715 



_ „„ tta GCG GAT GCG TTA AAT AAA TTA AGC 

1720 1725 



4734 



4782 



4830 



487B 



4926 



4974 



5022 



5070 



5118 



5166 



5214 



5262 



5310 



5358 
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TGG ACG GCA ACT GCT GGT AAA GAA GGC ACT GGT GAA GTT GAT CCT GCA 5406 
Trp Thr Ala Thr Ala Gly Lys Glu Gly Thr Gly Glu Val Asp Pro Ala 
1735 1740 1745 

AAT TCA GCA GGG CAA GAA GTC AAA GCG GGC GAC AAA GTA ACC TTT AAA 5454 
Asn Ser Ala Gly Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys 
1750 1755 1760 

GCC GGC GAC AAC CTG AAA ATC AAA CAA AGC GGC AAA GAC TTT ACC TAC 5502 
Ala Gly Asp Asn Leu Lys lie Lys Gin Ser Gly Lys Asp Phe Thr Tyr 
1765 1770 1775 1780 

TCG CTG AAA AAA GAG CTG AAA GAC CTG ACC AGC GTA GAG TTC AAA GAC 5550 
Ser Leu Lys Lys Glu Leu Lys Asp Leu Thr Ser Val Glu Phe Lys Asp 
1785 1790 1795 

GCA AAC GGC GGT ACA GGC AGT GAA AGC ACC AAG ATT ACC AAA GAC GGC 5598 
Ala Asn Gly Gly Thr Gly Ser Glu Ser Thr Lys lie Thr Lys Asp Gly 
1800 1805 1810 

TTG ACC ATT ACG CCG GCA AAC GGT GCG GGT GCG GCA GGT GCA AAC ACT 5646 
Leu Thr lie Thr Pro Ala Asn Gly Ala Gly Ala Ala Gly Ala Asn Thr 
1815 1820 1825 

GCA AAC ACC ATT AGC GTA ACC AAA GAT GGC ATT AGC GCG GGT AAT AAA 5694 
Ala Asn Thr lie Ser Val Thr Lys Asp Gly He Ser Ala Gly Asn Lys 
1830 1835 1840 

GCA GTT ACA AAC GTT GTG AGC GGA CTG AAG AAA TTT GGT GAT GGT CAT 5742 
Ala Val Thr Asn Val Val Ser Gly Leu Lys Lys Phe Gly Asp Gly His 
1845 1650 1855 1860 

ACG TTG GCA AAT GGC ACT GTT GCT GAT TTT GAA AAG CAT TAT GAC AAT 5790 
Thr Leu Ala Asn Gly Thr Val Ala Asp Phe Glu Lys His Tyr Asp Asn 
1865 1870 1875 

GCC TAT AAA GAC TTG ACC AAT TTG GAT GAA AAA GGC GCG GAT AAT AAT 5838 
Ala Tyr Lys Asp Leu Thr Asn Leu Asp Glu Lys Gly Ala Asp Asn Asn 
1880 1885 1890 

CCG ACT GTT GCC GAC AAT ACC GCT GCA ACC GTG GGC GAT TTG CGC GGC 5886 
Pro Thr Val Ala Asp Asn Thr Ala Ala Thr Val Gly Asp Leu Arg Gly 
1895 1900 1905 

TTG GGC TGG GTC ATT TCT GCG GAC AAA ACC ACA GGC GAA CCC AAT CAG 5934 
Leu Gly Trp Val He Ser Ala Asp Lys Thr Thr Gly Glu Pro Asn Gin 
1910 1915 1920 

GAA TAC AAC GCG CAA GTG CGT AAC GCC AAT GAA GTG AAA TTC AAG AGC 5982 
Glu Tyr Asn Ala Gin Val Arg Asn" Ala Asn Glu Val Lys Phe Lys Ser 
1925 1930 1935 1940 

GGC AAC GGT ATC AAT GTT TCC GGT AAA ACA TTG AAC GGT ACG CGC GTG 6030 
Gly Asn Gly He Asn Val Ser Gly Lys Thr Leu Asn Gly Thr Arg Val 
1945 1950 1955 
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R _ gaa GTG GTT AAA TCG AAT GAA TTT 

AT T ACC TTT GAA TTG GCT AAA GGC GAA ^ ^ ^ phe 

He Thr Phe Glu^Leu Ala Lys Gly i97Q 

ACC GTT AAG ART asp Gly Ser Glu Thr Asn Leu Val Lys Val Gly 

Thr val Lys Asn Ala Asp Gly Ser ex i985 

1975 1980 

Asp Met Tyr Tyr Ser Lys Glu Asp He Asp ^ 
1990 199 

_ R aaa tat AAG GTT GAA AAC GGC AAA GTC GTT 
ATG ACA GGT AAA ACT GAA AAA TAT AAG GTl ^ ^ yal yal 

Met Thr Gly Lys Thr Glu Lys Tyr Lys val 2Q20 
2005 2010 ^ 

TCT GCT AAC GGC AGC AAG ACC GAA GTT ACC CTA ACC AAC AAA GGT TCC 
Ser Ala Asn Gly Ser Lys Thr Glu Val Tnr 2()35 
2025 

Gly Tyr Val Thr Gly Asn Gin Val Ala Asp 2<)50 
2040 

Phe Glu Leu Gly Leu Ala Asp Ala Ala e ^ 
2055 2060 

riT AAA GCG GAA ACT GTA 
GAA AGC GCA AAA GAC AAG CAA TTG TCT AAA GAT AAA ^ ^ ^ ^ 
Glu Ser Ala Lys Asp Lys Gln^Leu Ser Lys p ^ 

AAT r CAC GAT AAA GTC CGT TTT GCT AAT GGT £ £ £ £ £ 
Asn Ala His Asp Lys Val Arg Phe Ala Asn y 21QQ 
2085 2090 

_. T GCA AAC GGC GAT AAA GTG ACC 

- s s s - - - ir 01y up tys «~ 

»cc m - r« ~ «. « = « ™ s s s s 

Thr Thr Phe Val Lys Thr Asp Val Glu 2i30 
2120 

ART ACC CA, CCA AAC «, «» S »«S S « ^ ^ 

Asn Thr Asp Ala Asn Gly Asn Lys lie ^ 
2135 

^r~r *rr GCG AGT AAC AAA GAA GTG ACA 
TGG TAT GAA CTG AAT GCT GAT GGT ACG GCG AG^ ^ ^ ^ ^ ^ 
T rp Tyr Glu Leu Asn Ala Asp Gly Thr ^ 
2150 2155 

„ rTT GT G AAA GTA ACC GAA 

CTT GGT AAC GTG GAT GCA AAC GGT AAG AAA G ^ ^ 
2u Gly Asn Val Asp Ala^Asn Gly Lys LY ^ „ 2190 

2165 217 



6078 



6126 



6174 



6222 



6270 



6318 



6366 



6414 



6462 



6510 



6558 



6606 



6654 



6702 
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AAT GGT GCG GAT AAG TGG TAT TAC ACC AAT GCT GAC GGT GCT GCG GAT €750 
Asn Gly Ala Asp Lys Trp Tyr Tyx Thr Asn Ala Asp Gly Ala Ala Asp 
2185 2190 2195 

AAA ACC AAA GGC GAA GTG AGC AAT GAT AAA GTT TCT ACC GAT GAA AAA 6798 
Lys Thr Lys Gly Glu Val Ser Asn Asp Lys Val Ser Thr Asp Glu Lys 
2200 2205 2210 

CAC GTT GTC CGC CTT GAT CCG AAC AAT GAA TCG AAC GGC AAA GGC GTG 6846 
His Val Val Arg Leu Asp Pro Asn Asn Gin Ser Asn Gly Lys Gly Val 
2215 2220 2225 

GTC ATT GAC AAT GTG GCT AAT GGC GAA ATT TCT GCC ACT TCC ACC GAT 6894 
Val lie Asp Asn Val Ala Asn Gly Glu He Ser Ala Thr Ser Thr Asp 
2230 2235 2240 

GCG ATT AAC GGA AGT CAG TTG TAT GCC GTG GCA AAA GGG GTA ACA AAC €942 
Ala He Asn Gly Ser Gin Leu Tyr Ala Val Ala Lys Gly Val Thr Asn 
2245 2250 2255 2260 

CTT GCT GGA CAA GTG AAT AAT CTT GAG GGC AAA GTG AAT AAA GTG GGC 6990 
Leu Ala Gly Gin Val Asn Asn Leu Glu Gly Lys Val Asn Lys Val Gly 
2265 2270 2275 

AAA CGT GCA GAT GCA GGT ACA GCA AGT GCA TTA GCG GCT TCA CAG TTA 7038 
Lys Arg Ala Asp Ala Gly Thr Ala Ser Ala Leu Ala Ala Ser Gin Leu 
2280 2285 2290 

CCA CAA GCC ACT ATG CCA GGT AAA TCA ATG GTT GCT ATT GCG GGA AGT 7086 
Pro Gin Ala Thr Met Pro Gly Lys Ser Met Val Ala He Ala Gly Ser 
2295 2300 2305 

AGT TAT CAA GGT CAA AAT GGT TTA GCT ATC GGG GTA TCA AGA ATT TCC 7134 
Ser Tyr Gin Gly Gin Asn Gly Leu Ala He Gly Val Ser Arg He Ser 
2310 2315 2320 

GAT AAT GGC AAA GTG ATT ATT CGC TTG TCA GGC ACA ACC AAT AGT CAA 7182 
Asp Asn Gly Lys Val lie He Arg Leu Ser Gly Thr Thr Asn Ser Gin 
2325 2330 2335 2340 

GGT AAA ACA GGC GTT GCA GCA GGT GTT GGT TAC CAG TGG TAAAGTTTGG 7231 
Gly Lys Thr Gly Val Ala Ala Gly Val Gly Tyr Gin Trp 
2345 2350 

ATTATCTCTC TTAAAAAGCG GCATTTGCCG CTTTTTTTAT GGGTGGCTAT TATGTATCGT 7291 



(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2353 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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{xi , SEQUENCE DESCRIPTION: SEQ ID N0:4: 
Met Asn Lys X- Phe - val ne X. Asn ~ ~ ~ - * « 
^ val val ser -1 - Thr Ar 9 - H iS Thr .vs „ - Ser Ala 

20 

, , * 1a Thr Leu Leu Phe Ala Thr Val Gin 
Thr Val Glu Thr Ala Val Leu Ala Thr Leu ^ 

» Asn All Thr Asp Glu Asp Glu Glu Leu Asp « Val Val Ar 9 Thr 

Ua l Val Leu Ser Phe His Ser Asp L y s Glu GlV Thr Gl y Glu L.s 

Glu val Thr Glu Asn Ser Asn Trp Gi y lie T.r Phe Asp Asn L.s Gi y 

Val _ L Y s Ala Gi y Ala lie Thr Leu L.s Ala Gl y Asp As. ^ * 
100 

Ile ws s - ~, - s - - - ~ s pte "* " 

c Pr val Ala Thr Glu Lys 

_ T „. T hr Asp Leu Thr ser vdi ~ 

Ser Leu Lys Lys Asp Leu Thr as P ^ q 

^ 2 « «y »■ S •» - «" V ' 1 S ~ " * 

« «, - - - «■ - - s oly ~ val " ~ 

165 

Gly , u „ P « - - - - - ~ ™ - Thr s val ~ 
s « « s« 1" ». - » - - « - - z m Ma ua 

195 

Thr « W -P - - - - « - S 

Thr T. 1, «V - - - - - 5 W S " " a " " 

2 v>1 «. ~ - - - - ». s « - « - E - 

Tht «. - «y - - - - - - ~ is pto Lys 

260 

*=o Glv Lys Leu Phe Thr Gly Lys Glu 
Thr ser Val He Lys Glu Lys Asp Gly Lys ^ 

275 
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Asn Asn Asp Thr Asn Lys Val Thr Ser Asn Thr Ala Thr Asp Asn Thr 
290 295 300 

Asp Glu Gly Asn Gly Leu Val Thr Ala Lys Ala Val He Asp Ala Val 
305 310 315 320 

Asn Lys Ala Gly Trp Arg Val Lys Thr Thr Thr Ala Asn Gly Gin Asn 
325 330 335 

Gly Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Glu Ser 
340 345 350 

Gly Asp Gly Thr Thr Ala Ser Val Thr Lys Asp Thr Asn Gly Asn Gly 
355 360 365 

He Thr Val Lys Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Phe Asp 
370 375 380 

Ser Asp Lys Lys He Val Ala Asp Thr Thr Ala Leu Thr Val Thr Gly 
385 390 395 400 

Gly Lys Val Ala Glu He Ala Lys Glu Asp Asp Lys Lys Lys Leu Val 
405 410 415 

Asn Ala Gly Asp Leu Val Thr Ala Leu Gly Asn Leu Ser Trp Lys Ala 
420 425 430 



Lys Ala Glu Ala Asp Thr Asp Gly Ala Leu Glu Gly He Ser Lys Asp 
435 440 445 

Gin Glu Val Lys Ala Gly Glu Thr Val Thr Phe Lys Ala Gly Lys Asn 
450 455 460 

Leu Lys Val Lys Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gin Asp 
465 470 475 4B0 

Ala Leu Thr Gly Leu Thr Ser He Thr Leu Gly Gly Thr Thr Asn Gly 
4B5 490 495 

Gly Asn Asp Ala Lys Thr Val He Asn Lys Asp Gly Leu Thr He Thr 
500 505 510 

Pro Ala Gly Asn Gly Gly Thr Thr Gly Thr Asn Thr He Ser Val Thr 
515 520 525 

Lys Asp Gly He Lys Ala Gly Asn Lys Ala He Thr Asn Val Ala Ser 
530 535 540 

Gly Leu Arg Ala Tyr Asp Asp Ala Asn Phe Asp Val Leu Asn Asn Ser 
545 550 555 560 

Ala Thr Asp Leu Asn Arg His Val Glu Asp Ala Tyr Lys Gly Leu Leu 
565 570 575 



Asn Leu Asn Glu Lys Asn Ala Asn Lys Gin Pro Leu Val Thr Asp Ser 
580 585 590 
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Thr Ala Ala Thr Val Gly Asp Leu Arg Lys Leu Gly Trp Val Val Ser 

595 600 
Thr Lys Asn Gly Thr Lys Glu Glu Ser As* Gin Val Lys Gin Ala Asp 

610 615 
Glu val Leu Phe Thr Gly Ala Gly Ala Ala Thr Val Thr Ser Lys Ser 
625 «° " 

4 „, , /=l1 e pr val Ala Glu Thr Lys Ala 

Glu Asn Gly Lys His Thr He Thr Val Ser Val Aia ^ 

645 650 

„ p cy, sly «» Glu Ly ~P «V «P «« XI. Ly "» £ v.! »P 

660 665 

d„ » Thr J» v.1 u. Thr v.! sly »» «» sly Thr ». 

675 680 



val Thr Lys Gly Gly Phe Glu Thr Val Lys Thr Gly Ala Thr Asp Ala 

690 695 
Asp Arg Gly Lys Val Thr Val Lys Asp Ala Thr Ala Asn Asp Ala Asp 

705 710 

Lys Lys val Ala Thr Val Lys Asp Val Ala Thr Ala lie Asn Ser Ala 

725 730 

M. Thr Phe V.1 -y. Thr Glu _ L« Thr Thr s.r II. £ Glu «P 

740 745 
Asn Pro Thr Asp Asn Gly Lys Asp Asp Ala Leu Lys Ala Gly Asp Thr 

755 760 
L eu Thr Phe Lys Ala Gly Lys Asn Leu Lys val Lys Arg Asp Gly Lys 

770 775 

Asn lie Thr Phe Asp Leu Ala Lys Asn Leu Glu Val Lys Thr Ala Lys 

785 7yu 

~w Tio rw Glv Asn Thr Pro Thr Gly Gly Thr 
Val Ser Asp Thr Leu Thr He Gly Gly Asn ^ 

805 810 

Thr „ Thr Pro Ly. v.1 A» IX. Thr s.r Thr M. «P «» «~ - 

820 82b 
Ph. M. JJ. Glu Thr ». *P »U S.r Gly S.r -y. « V.! Tyr M u 

Ly5 Gly T. «. Thr Thr Jj. Thr Glu Pro S.r M. «, * ~ 

850 855 
s.r HL v.1 «P »■ «» v.1 «P «. Thr Ly. Lys s.r «„ *1. «. 

865 870 

S.r II. Clu «P v.1 L- «. «• TTP « I- «U «T *« 



8.5 
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Asn Asn Val Asp Tyr Val Ala Thr Tyx Asp Thr Val Asn Phe Thr Asp 
900 905 910 

Asp Ser Thr Gly Thr Thr Thr Val Thr Val Thr Gin Lys Ala Asp Gly 
915 920 925 

Lys Gly Ala Asp Val Lys He Gly Ala Lys Thr Ser Val He Lys Asp 
930 935 940 

His Asn Gly Lys Leu Phe Thr Gly Lys Asp Leu Lys Asp Ala Asn Asn 
945 950 955 960 

Gly Ala Thr Val Ser Glu Asp Asp Gly Lys Asp Thr Gly Thr Gly Leu 
965 970 975 

Val Thr Ala Lys Thr Val He Asp Ala Val Asn Lys Ser Gly Trp Arg 
980 985 990 

Val Thr Gly Glu Gly Ala Thr Ala Glu Thr Gly Ala Thr Ala Val Asn 
995 1000 1005 



Ala Gly Asn Ala Glu Thr Val Thr 
1010 1015 

Asn Gly Asn Ala Thr Thr Ala Thr 
1025 1030 

Asn Val Lys Tyr Asp Val Asn Val 
1045 

Asp Lys Lys He Val Ala Asp Thr 
1060 

Lys Val Ser Val Pro Ala 
1075 

Leu Val Asn Ala Glu Gly Leu Ala 
1090 1095 

Thr Ala Lys Ala Asp Lys Tyr Ala 
1105 mo 



Ser Gly Thr Ser Val Asn Phe Lys 
1020 

Val Ser Lys Asp Asn Gly Asn He 
1035 104 0 

Gly Asp Giy Leu Lys He Gly Asp 
1050 1055 

Thr Thr Leu Thr Val Thr Gly Gly 
1065 1070 



Thr Ala Leu Asn Asn Leu Ser Trp 
1100 

Asp Gly Glu Ser Glu Gly Glu Thr 
1115 1120 



Gly Ala Asn Ser Val Asn Asn Asn Lys Lys 
1080 1065 



Asp Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys Ala Gly Lys 
1125 1130 1135 

Asn Leu Lys Val Lys Gin Ser Glu Lys Asp Phe Thr Tyr Ser Leu Gin 
1140 1145 1150 

Asp Thr Leu Thr Gly Leu Thr Ser He Thr Leu Gly Gly Thr Ala Asn 
1155 1160 1165 

Gly Arg Asn Asp Thr Gly Thr Val He Asn Lys Asp Gly Leu Thr He 
1170 1175 1180 

Thr Leu Ala Asn Gly Ala Ala Ala Gly Thr Asp Ala Ser Asn Gly Asn 
1185 1190 1195 1200 
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^ Ile s.r v., » «. "P «» n. ~ - - - S» 5 "« 

1205 

« » v* ». ~ - - £, W " P ~ S."~ ™ 

1220 X " S 

ua „ * «. « »• «j - - «» «• ~ 

1235 



Kn olu «x «. - « -t«. - -» ». S ~ ~ - 

1250 1255 

• ~w v a i Thr He Asp Val Ala Glu Ala 
Thr Asp Asn Asn Gly Itf. J Hx. Thr Val Thr 11^ 

1265 12 

WB v.> «y up «y - «» - - ™ w G1V - ~ S "° 

1285 

w . v., „ «- ~ «. «» ~ - ~ Mu Thr vi1 So"" Thr 

1300 

« i Lvs Gly Glu Phe Asn Ala Val Thr Thr Asp 
Lys Gly Ala Ser Val Ala Lys Gly fa 1325 

1315 AJ 

»- «~ - ~ - ~ - Gl " So"" "~ ~ - 1 
«j « - «v - - - - - 

r«. «. « - ~ «*■ si. 11 * *- mp ua »' 5 ™ 

1365 iJ 

- v.! w. V.X «. ~ « «P ~ - - 



1380 



^ MP u. - ~» %r <*• Giy rr M " u TOt 
„ u Lys r«, «- ««- ~ «■ ™ So oly Lys M " 

1410 1415 

» t»» ser Val Lys Ser Ala Thr Val Ser 
Thr Phe Ala Leu Ala Asn Asp Leu Ser Val y 144Q 

25 1430 

H P „ « « «. «y - - «" - - - "° 5U~ 

1445 



^ Thr W * « T - - ~ - ~ 

1460 

G1V He Ala Ser Thr Leu Thr Asp Thr Leu 
Ala Asn He His Leu Asn Gly Iie^ 14(J5 
1475 

Mu » s„ «, »• - ™ - «' Gly "* ~ " 

1490 
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Glu Lys Lys Arg Ala Ala Ser Val Lys Asp Val Leu Asn Ala Gly 



Asn 

1505 1510 



1515 1520 



Trp Asn Val Arg Gly Val Lys Pro Ala Ser Ala Asn Asn Gin Val Glu 
152 5 1530 1535 

Asn lie Asp Phe Val Ala Thr Tyr Asp Thr Val Asp Phe Val Ser Gly 
1540 1545 "SO 

Asp Lys Asp Thr Thr Ser Val Thr Val Glu Ser Lys Asp Asn Gly Lys 
1555 1560 I 565 

Arg Thr Glu Val Lys He Gly Ala Lys Thr Ser Val He Lys Asp His 
1570 1575 "BO 

Asn Gly Lys Leu Phe Thr Gly Lys Glu Leu Lys Asp Ala Asn Asn Asn 
15 85 1590 1595 "00 

Gly val Thr Val Thr Glu Thr Asp Gly Lys Asp Glu Gly Asn Gly Leu 
1605 1610 1615 

Val Thr Ala Lys Ala Val He Asp Ala Val Asn Lys Ala Gly Trp Arg 
1620 1625 1630 

Val Lys Thr Thr Gly Ala Asn Gly Gin Asn Asp Asp Phe Ala Thr Val 
1635 1640 1645 

Ala Ser Gly Thr Asn Val Thr Phe Ala Asp Gly Asn Gly Thr Thr Ala 
1650 1655 1660 

Glu Val Thr Lys Ala Asn Asp Gly Ser He Thr Val Lys Tyr Asn Val 
1665 1670 1675 1680 

Lys Val Ala Asp Gly Leu Lys Leu Asp Gly Asp Lys He Val Ala Asp 
1685 1690 1695 

Thr Thr Val Leu Thr Val Ala Asp Gly Lys Val Thr Ala Pro Asn Asn 
1700 1705 1710 

Gly Asp Gly Lys Lys Phe Val Asp Ala Ser Gly Leu Ala Asp Ala Leu 
17 15 1720 1725 

Asn Lys Leu Ser Trp Thr Ala Thr Ala Gly Lys Glu Gly Thr Gly Glu 
1730 1735 1740 

Val Asp Pro Ala Asn Ser Ala Gly Gin Glu Val Lys Ala Gly Asp Lys 
1745 1750 1755 1760 

Val Thr Phe Lys Ala Gly Asp Asn Leu Lys He Lys Gin Ser Gly Lys 
1765 1770 1775 

Asp Phe Thr Tyr Ser Leu Lys Lys Glu Leu Lys Asp Leu Thr Ser Val 
1780 1785 1790 

Glu Phe Lys Asp Ala Asn Gly Gly Thr Gly Ser Glu Ser Thr Lys He 
1795 1800 1805 
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Thr Lys Asp Gly Leu Thr lie Thr Pro Ala Asn Gly Ala Gly Ala Ala 
1810 "15 "20 

Gly Ala Asn Thr Ala Asn Thr lie Ser Val Thr Lys Asp Gly He Ser^ 
1825 1«° 1835 

Ala Gly Asn Lys Al^Val Thr Asn Val Valuer Gly Leu Lys Lys^Phe 

Gly Asp Gly His Thr Leu Ala Asn Gly Thr Val Ala Asp Phe^Glu Lys 

I860 1865 
His Tyr Asp Asn Ala Tyr Lys Asp Leu Thr Asn Leu Asp Glu Lys Gly 



1875 



1880 1885 



Ala Asp Asn Asn Pro Thr Val Ala Asp Asn Thr Ala Ala Thr Val Gly 
1890 

Asp Leu Arg Gly Leu Gly Trp Val He Ser Ala Asp Lys Thr Thr Gly 
1905 1910 

Glu Pro Asn Gin Glu Tyr Asn Ala Gin Val Arg Asn Ala Asn Glu Val 
1925 1930 1935 

Lys Phe Lys Ser Gly Asn Gly He Asn Val Ser Gly Lys Thr Leu Asn 
1940 1945 1950 

Gly Thr Arg val He Thr Phe Glu Leu Ala Lys Gly Glu Val Val Lys 
1955 I960 1965 

Glu Phe Thr Val Lys Asn Ala Asp Gly Ser Glu Thr Asn Leu 
1975 1980 



Ser Asn 
1970 

Val Lys Val Gly Asp Met Tyr Tyr Ser Lys Glu Asp He Asp Pro Ala^ 
1985 1990 1995 

Thr Ser Lys Pro Metjhr Gly Lys Thr Glu^Lys Tyr Lys Val Gl^Asn 

Gly Lys val Val Ser Ala Asn Gly Ser Lys Thr Glu Val Th^Leu Thr 

2020 2025 

Asn Lys Gly Ser Gly Tyr Val Thr Gly Asn Gin Val Ala Asp Ala He 
2035 2° 40 2045 

Ala Lys Ser Gly Phe Glu Leu Gly Leu Ala Asp Ala Ala Glu Ala Glu 
2050 2055 2060 

Lys Ala Phe Ala Glu Ser Ala Lys Asp Lys Gin Leu Ser Lys Asp Lys^ 
2065 2070 2075 

Aia Glu Thr Val Asn Ala His Asp Lys Val Arg Phe Ala Asn Gly Leu 



2085 



2090 



Asn Thr Lys Val Ser Ala Ala Thr Val Glu Ser Thr Asp Ala Asn Gly 



2100 



2105 
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Asp Lys Val Thr Thr Thr Phe Val Lys Thr Asp Val Glu Leu Pro Leu 
2115 2120 2125 

Thr Gin He Tyr Asn Thr Asp Ala Asn Gly Asn Lys He Val Lys Lys 
2130 2135 2140 

Ala Asp Gly Lys Trp Tyr Glu Leu Asn Ala Asp Gly Thr Ala Ser Asn 
2145 2150 2155 2160 

Lys Glu Val Thr Leu Gly Asn Val Asp Ala Asn Gly Lys Lys Val Val 
2165 2170 2175 

Lys Val Thr Glu Asn Gly Ala Asp Lys Trp Tyr Tyr Thr Asn Ala Asp 
21B0 2185 2190 

Gly Ala Ala Asp Lys Thr Lys Gly Glu Val Ser Asn Asp Lys Val Ser 
2195 2200 2205 

Thr Asp Glu Lys His Val Val Arg Leu Asp Pro Asn Asn Gin Ser Asn 
2210 2215 2220 

Gly Lys Gly Val Val He Asp Asn Val Ala Asn Gly Glu lie Ser Ala 
2225 2230 2235 2240 

Thr Ser Thr Asp Ala He Asn Gly Ser Gin Leu Tyr Ala Val Ala Lys 
2245 2250 2255 

Gly Val Thr Asn Leu Ala Gly Gin Val Asn Asn Leu Glu Gly Lys Val 
2260 2265 2270 

Asn Lys Val Gly Lys Arg Ala Asp Ala Gly Thr Ala Ser Ala Leu Ala 
2275 2280 2285 

Ala Ser Gin Leu Pro Gin Ala Thr Met Pro Gly Lys Ser Met Val Ala 
2290 2295 2300 

lie Ala Gly Ser Ser Tyr Gin Gly Gin Asn Gly Leu Ala He Gly Val 
2305 2310 2315 2320 

Ser Arg He Ser Asp Asn Gly Lys Val He He Arg Leu Ser Gly Thr 
2325 2330 2335 

Thr Asn Ser Gin Gly Lys Thr Gly Val Ala Ala Gly Val Gly Tyr Gin 
2340 2345 2350 



Trp 



(2) INFORMATION FOR SEQ ID NO : 5 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 658 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 
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(ii ) MOLECULE TYPE: protein 

(xi ) SEQUENCE DESCRIPTION: SEQ ID NO:5: 

r « ^ n. r - v., n. TO » v. v., * «» ~ «, 

L Va> v.! « L T ta « ~ » iS »r W C f . ~ - 



20 



VaX «! V.. « JU - - - - ^ * ~ ~ ■ 

u . „ I » * « - - - - - a- " a ^ 01y " p 

^ 1 p„. ~ - »« MP " a 01u Lvs - 

70 

«1 «. «. Ma Tyr -vs «, - - « ^ ^ » ^ 

85 

x v»i alu asp Asn Thr Ala Ala Thr Val Gly Asn 
Ser Asp Lys Leu Leu Val Glu Asp Asn iiQ 

100 

t olv Trp Val Leu Ser Ser Lys Asn Gly Thr Arg Asn 
Leu Arg Lys Leu Gly Trp vai l* i25 

115 

„ u £ ~ «. «. V- £ HU »U «P «• « « - »» 

£ Tv «, V.! «. - T te S„ * ~ «. - «, - * 

ul Thr Ala «. 1 - «> - £ ^ ^ ™ IS ~ 

1G5 

^ Tie Glv Gly Gly Ala Ala Ala Gly Ala Thr Thr 
Ser Asp Thr Leu Thr He Gly Gly to y 19Q 

ieo 

, » v«i Thr ser Thr Thr Asp Gly Leu Lys Phe Ala 
Thr Pro Lys Val Asn Val Thr Ser t ^ 

195 

„, *<-~ T hr Thr Val His Leu Asn Gly 
Lys Asp Ala Ala Gly Ala Asn Gly Asp Thr ^ 

210 

Val Gly Ser Pro Ala Thr His 
Ile Asp Gly Gly Asp Gin Ser Thr H.s Tyr Thr Arg Ala Ala Ser He 

* ai* Glv Trp Asn He Lys Gly Val Lys Ala Gly 
Lys Asp val Leu Asn Ala Gly Trp ^ 27Q 
260 



lie Gly Ser Thr Leu Thr Asp Thr Leu ^ 



WO 96/30519 



PCI7US96/04031 



75 

Ser Thr Thr Gly Gin Ser Glu Asn Val Asp Phe Val His Thr Tyr Asp 
275 280 285 

Thr Val Glu Phe Leu Ser Ala Asp Thr Glu Thr Thr Thr Val Thr Val 
290 295 300 

Asp Ser Lys Glu Asn Gly Lys Arg Thr Glu Val Lys He Gly Ala Lys 
305 310 315 320 

Thr Ser Val He Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly Lys Ala 
325 330 335 

Asn Lys Glu Thr Asn Lys Val Asp Gly Ala Asn Ala Thr Glu Asp Ala 
340 345 350 

Asp Glu Gly Lys Gly Leu Val Thr Ala Lys Asp Val He Asp Ala Val 
355 360 365 

Asn Lys Thr Gly Trp Arg He Lys Thr Thr Asp Ala Asn Gly Gin Asn 

, 370 375 380 

Gly Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Ala Ser 
385 390 395 400 

Gly Asn Gly Thr Thr Ala Thr Val Thr Asn Gly Thr Asp Gly He Thr 
405 410 415 

Val Lys Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Leu Asp Gly Asp 
420 425 430 

Lys He Ala Ala Asp Thr Thr Ala Leu Thr Val Asn Asp Gly Lys Asn 
435 440 445 

Ala Asn Asn Pro Lys Gly Lys Val Ala Asp Val Ala Ser Thr Asp Glu 
450 455 460 

Lys Lys Leu Val Thr Ala Lys Gly Leu Val Thr Ala Leu Asn Ser Leu 
465 470 475 480 

Ser Trp Thr Thr Thr Ala Ala Glu Ala Asp Gly Gly Thr Leu Asp Gly 
485 490 495 

Asn Ala Ser Glu Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys 
500 505 510 

Ala Gly Lys Asn Leu Lys Val Lys Gin Glu Gly Ala Asn Phe Thr Tyr 
515 520 525 

Ser Leu Gin Asp Ala Leu Thr Gly Leu Thr Ser He Thr Leu Gly Thr 
530 535 540 

Gly Asn Asn Gly Ala Lys Thr Glu He Asn Lys Asp Gly Leu Thr He 
545 550 555 560 

Thr Pro Ala Asn Gly Ala Gly Ala Asn Asn Ala Asn Thr He Ser Val 
565 570 575 
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^ no Qpr Ala Gly Gly Gin Ser Val Lys Asn Val Val 
Thr Lys Asp Gly He Ser Ala i*xy 5gQ 

5B0 585 

Ser «y U. «. UT. — «» 2 "° " P S 

595 600 

s „ «. «p - « - ^ - - «> *** f 2 o - ° 1V 

610 615 



^ t v « Glv Thr Asp Lys Gin Thr Pro Val Val Ala 
Thr Asn Leu Asp Glu Lys Gly Ttir y 64Q 

625 630 

^p .sn Thr Ala Ala Thr Val Gly Asp Leu Ar. Gly Leu Gly Trp Val 



645 
lie Ser 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 607 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 



Val He Trp Asn Val Met Thr Gin Thr Trp 



Met Asn Lys He Phe Asn Val ne «p ^ 15 

1 



5 " 



V,! V,! V,! _ Hi. ». ,ys «, L- «. <~> 

20 " 

„ «„ «P «o V,! ~ - ~ - »° If « G1 " " a 

35 40 



„ «a T„r c lu ~P «■ «» - « « » l ™ 

50 55 

^ «^ ser Asp Lys Glu Gly Thr Gly Glu Lys Glu 
Pro Val Leu Ser Phe His Ser Asp uy* * BO 

70 

65 

e ,cn Tro Glv He Tyr Phe Asp Asn Lys Gly Val 
val Thr Glu Asn Ser Asn Trp Giy " / 95 
85 

^ t twe at a Glv Asp Asn Leu Lys Xaa 
Leu Lys Ala Gly Ala lie Thr Leu Lys Ala Gly Asp ^ 

100 

r-i » aa Thr Asn Ala Ser Ser Phe Thr Tyr Ser 
Lys Gin Xaa Thr Asp Glu Xaa Thr Asn a lM 

115 

* t*u Thr Asp Leu Thr Ser Val Ala Thr Glu Lys Leu 
Leu Lys Lys Asp Leu Thr Asp ne ^ 

130 
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Ser 
145 



Phe Gly Ala Asn Gly Asp Lys Val Asp He Thr Ser Asp Ala Asn 

160 



150 155 



Glv Leu Lys Leu Ala Lys Thr Gly Asn Gly Asn Val His Leu Asn Gly 
165 170 175 

Leu Asp Ser Thr Leu Pro Asp Ala Val Thr Asn Thr Gly Val Leu Ser 
180 I 85 190 

Ser Ser Ser Phe Thr Pro Asn Asp Val Glu Lys Thr Arg Ala Ala Thr 
195 200 205 

Val Lys Asp Val Leu Asn Ala Gly Trp Asn He Lys Gly Ala Lys Thr 
210 215 220 

Ala Gly Gly Asn Val Glu Ser Val Asp Leu Val Ser Ala Tyr Asn Asn 



225 



val Glu Phe He Thr Gly Asp Lys Asn Thr Leu Asp Val Val Leu Thr 
245 250 255 

Ala Lys Glu Asn Xaa Lys Thr Thr Glu Val Lys Phe Thr Pro Lys Thr 
260 265 270 

Ser Val He Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly Lys Glu Asn 
275 280 285 

Asn Asp Thr Asn Lys Val Thr Ser Asn Thr Ala Thr Asp Asn Thr Asp 
290 295 300 

Glu Gly Asn Gly Leu Val Thr Ala Lys Ala Val lie Asp Ala Val Asn 
305 310 315 320 

Lys Ala Gly Trp Arg Val Lys Thr Thr Thr Ala Asn Gly Gin Asn Gly 
325 330 335 

Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Glu Ser Gly 
340 345 350 

Asp Gly Thr Thr Ala Ser Val Thr Lys Asp Thr Asn Gly Asn Gly lie 



355 



360 365 



Thr Val Lys Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Phe Asp Ser 
370 375 380 

Asp Lys Lys He Val Ala Asp Thr Thr Ala Leu Thr Val Thr Gly Gly 
385 390 395 

Lys Val Ala Glu He Ala Lys Glu Asp Asp Lys Lys Lys Leu Val Asn 
405 ' 415 

Ala Gly Asp Leu Val Thr Ala Leu Gly Asn Leu Ser Trp Lys Ala Lys 



420 



425 



Ala 



Glu Ala Asp Thr Asp Gly Ala Leu Glu Gly He Ser Lys Asp Gin 



435 



440 



445 
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Glu val Lys Ala Gly Glu Thr Val Thr Phe Lys Ala Gly Lys Asn Leu 
450 455 460 

Lvs Val Lys Gin Asp Gly Ala Asn Phe Thr Tyr Ser Leu Gin Asp Ala 
465 470 475 480 

Leu Thr Gly Leu Thr Ser He Thr Leu Gly Gly Thr Thr Asn Gly Gly 
485 490 495 

Asn Asp Ala Lys Thr Val lie Asn Lys Asp Gly Leu Thr lie Thr Pro 
500 505 510 

' Ala Gly Asn Gly Gly Thr Thr Gly Thr Asn Thr lie Ser Val Thr Lys 
515 520 525 

Asp Gly lie Lys Ala Gly Asn Lys Ala He Thr Asn Val Ala Ser Gly 
530 535 540 

Leu Arg Ala Tyr Asp Asp Ala Asn Phe Asp Val Leu Asn Asn Ser Ala 

545 550 - 555 560 

Thr Asp Leu Asn Arg His Val Glu Asp Ala Tyr Lys Gly Leu Leu Asn 
565 570 575 

Leu Asn Glu Lys Asn Ala Asn Lys Gin Pro Leu Val Thr Asp Ser Thr 
580 585 590 

Ala Ala Thr Val Gly Asp Leu Arg Lys Leu Gly Trp Val Val Ser 
595 600 6° 5 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

Met Asn Lys lie Phe Asn Val He Trp Asn Val Met Thr Gin Thr Trp 



15 

1 



5 1° 



Val Val Val Ser Glu Leu Thr Arg 
20 

(2) INFORMATION FOR SEQ ID NO : 8 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 
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(ii) MOLECULE TYPE : protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:B: 

Met Asn Lys He Phe Asn Val He Trp Ash Val Val Thr Gin Thr Trp 
1 5 10 15 

Val Val Val Ser Glu Leu Thr Arg 

20 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE : amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

Met Asn Lys He Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
15 10 15 

Val Ala Val Ser Glu Leu Ala Arg 
20 



(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 

( C ) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Asn Lys He Tyr Arg Leu Lys Phe Ser Lys Arg Leu Asn Ala Leu 
1 5 10 15 



Val Ala Val Ser Glu Leu Ala Arg 
20 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID MO: 11: 

« » lv, M. Tvr s.r n. «P £. tt. «r «, a. « »P 



1 5 



He Val Ala Ser Glu Leu Ala Arg 



20 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acxds 

(B) TYPE; amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(xi) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

„« »„ n. v - tr "* val *™ s y p6e 



1 5 



He Ala val Ser Glu Phe Ala Arg 



20 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii> MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Met Asn Lys XI. ^ryr^u Lys Tyr Cys His xi. T h r ^ ser I- 

i 5 

He Ala Val Ser Glu Leu Ala Arg 
20 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 2037 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE : DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

ATGAACAAAA TTTTTAACGT TATTTGGAAT GTTGTGACTC AAACTTGGGT TGTCGTATCT 60 

GAACTCACTC GCACCCACAC CAAATGCGCC TCCGCCACCG TGGCAGTTGC CGTATTGGCA 120 

ACCCTGTTGT CCGCAACGGT TCAGGCGAAT GCTACCGATG AAAACGAAGA TGATGAAGAA 180 

GAGTTAGAAC CCGTACAACG CTCTGTTTTA AGGTGGAGCT TCAAATCCGC TAAGGAAGGC 240 

ACTGGAGAAC AAGAGGGAAC AACAGAGGTA ATAAATTTGA ACACAGATTC ATCAGGAAAT 300 

GCAGTAGGAA GCAGCACAAT CACCTTCAAA GCCGGCGACA ACCTGAAAAT CAAACAAAGC 360 

GGCAATGACT TCACCTACTC GCTGAAAAAA GAGCTGAAAA ACCTGACCAG TGTTGAAACT 420 

GAAAAATTAT CGTTTGGCGC AAACGGCAAT AAAGTTGATA TTACCAGTGA TGCAAATGGC 480 

TTGAAATTGG CGAAAACAGG TAACGGAAAT GGTCAAAACA GTAATGTTCA CTTAAACGGT 54 0 

ATTGCTTCGA CTTTGACCGA TACGCTTGCC GGTGGCACAA CAGGACACGT TGACACCAAC 600 

ATTGATGCGG TTAATTATCA TCGCGCTGCA AGCGTACAAG ATGTGTTAAA CAGCGGTTGG 660 

AATATCCAAG GCAATGGAAA CAATGTCGAT TTTGTCCGTA CTTACGACAC CGTGGACTTT 720 

GTCAATGGCG CGAATGCCAA TGTGAGCGTT ACGGCTGATA CGGCTCACAA AAAGACAACT 780 

GTCCGTGTGG ATGTAACAGG CTTGCCGGTT CAATATGTTA CGGAAGACGG CAAAACCGTT 840 

GTGAAAGTGG GCAATGAGTA TTACAAAGCC AAAGATGACG GTTCGGCGGA TATGAATCAA 900 

AAAGTCGAAA ACGGCGAGCT GGCGAAAACC AAAGTGAAAT TGGTATCGGC AAGCGGTACA 960 

AATCCGGTGA AAATTAGCAA TGTTGCAGAC GGCACGGAAG ACACCGATGC GGTCAGCTTT 1020 

AAGCAATTAA AAGCCTTGCA AGACAAACAG GTTACGTTGA GCACGAGCAA TGCTTATGCC 1080 

AATGGCGGTA CAGATAACGA CGGCGGCAAG GCAACTCAAA CTTTAAGCAA TGGTTTGAAT 114 0 

TTTAAATTTA AATCTAGCGA TGGCGAGTTG TTGAAAATTA GCGCGACCGG CGATACGGTT 1200 

ACTTTTACGC CGAAAAAAGG TTCGGTACAG GTTGGCGATG ATGGCAAGGC TTCAATTTCA 1260 

AAAGGTGCAA ATACAACTGA AGGTTTGGTT GAGGCTTCTG AATTGGTTGA AAGCCTGAAC 1320 

AAACTGGGTT GGAAAGTAGG GGTTGAGAAA GTCGGCAGCG GCGAGCTTGA TGGTACATCC 1380 

AAGGAAACTT TAGTGAAGTC GGGCGATAAA GTAACTTTGA AAGCCGGCGA CAATCTGAAG 1440 

GTCAAACAAG AGGGCACAAA CTTCACTTAC GCGCTCAAAG ATGAATTGAC GGGCGTGAAG 1500 

AGCGTGGAGT TTAAAGACAC GGCGAATGGT GCAAACGGTG CAAGCACGAA GATTACCAAA 156 0 
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GACGGCTTGA CCATTACGCT GGCAAACGGT GCGAATGGTG CGACGGTGAC TGATGCCGAC 
AAGATTAAAG TTGCTTCGGA CGGCATTAGC GCGGGTAATA AAGCAGTTAA AAACGTCGCG 
GCAGGCGAAA TTTCTGCCAC TTCCACCGAT GCGATTAACG GAAGCCAGTT GTATGCCGTG 
GCAAAAGGGG TAACAAACCT TGCTGGACAA GTGAATAATC TTGAGGGCAA AGTGAATAAA 
GTGGGCAAAC GTGCAGATGC AGGTACTGCA AGTGCATTAG CGGCTTCACA GTTACCACAA 
GCCACTATGC CAGGTAAATC AATGGTTTCT ATTGCGGGAA GTAGTTATCA AGGTCAAAAT 
GGTTTAGCTA TCGGGGTATC AAGAATTTCC GATAATGGCA AAGTGATTAT TCGCTTGTCT 
GGCACAACCA ATAGTCAAGG TAAAACAGGC GTTGCAGCAG GTGTTGGTTA CCAGTGG 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 679 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: protein 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

Met Asn Lys He Phe Asn Val lie Trp Asn Val Val Thr Gin Thr Trp 
! 5 10 15 

Val val Val Ser Glu Leu Thr Arg Thr His Thr Lys Cys Ala Ser Ala 
20 25 30 

Thr Val Ala Val Ala Val Leu Ala Thr Leu Leu Ser Ala Thr Val Gin 
35 40 45 

Ala Asn Ala Thr Asp Glu Asn Glu Asp Asp Glu Glu Glu Leu Glu Pro 
50 55 60 

val Gin Arg Ser Val Leu Arg Trp Ser Phe Lys Ser Ala Lys Glu Gly 
65 ™ 75 

Thr Gly Glu Gin Glu Gly Thr Thr Glu Val lie Asn Leu Asn Thr Asp 
85 90 95 

Ser Ser Gly Asn Ala Val Gly Ser Ser Thr lie Thr Phe Lys Ala Gly 
100 105 110 

. Asp Asn Leu Lys lie Lys Gin Ser Gly Asn Asp Phe Thr Tyr Ser Leu 
115 120 125 

Lys Lys Glu Leu Lys Asn Leu Thr Ser Val Glu Thr Glu Lys Leu Ser 
130 "5 "0 



1620 
1680 
1740 
1800 
1860 
1920 
1980 
2037 
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Phe Gly Ala Asn Gly Asn Lys Val Asp He Thr Ser Asp Ala Asn Gly 
145 150 155 160 

Leu Lys Leu Ala Lys Thr Gly Asn Gly Asn Gly Gin Asn Ser Asn Val 
165 170 175 

His Leu Asn Gly He Ala Ser Thr Leu Thr Asp Thr Leu Ala Gly Gly 
180 185 190 i 

Thr Thr Gly His Val Asp Thr Asn He Asp Ala Val Asn Tyr His Arg 
195 200 205 

Ala Ala Ser Val Gin Asp Val Leu Asn Ser Gly Trp Asn He Gin Gly 
210 215 220 

Asn Gly Asn Asn Val Asp Phe Val Arg Thr Tyr Asp Thr Val Asp Phe 
225 230 235 240 

Val Asn Gly Ala Asn Ala Asn Val Ser Val Thr Ala Asp Thr Ala His 
245 250 255 

Lys Lys Thr Thr Val Arg Val Asp Val Thr Gly Leu Pro Val Gin Tyr 
260 265 270 

Val Thr Glu Asp Gly Lys Thr Val Val Lys Val Gly Asn Glu Tyr Tyr 
275 280 2B5 

Lys Ala Lys Asp Asp Gly Ser Ala Asp Met Asn Gin Lys Val Glu Asn 
290 295 300 

Gly Glu Leu Ala Lys Thr Lys Val Lys Leu Val Ser Ala Ser Gly Thr 
305 310 315 320 

Asn Pro Val Lys He Ser Asn Val Ala Asp Gly Thr Glu Asp Thr Asp 
325 330 335 

Ala Val Ser Phe Lys Gin Leu Lys Ala Leu Gin Asp Lys Gin Val Thr 
340 345 350 

Leu Ser Thr Ser Asn Ala Tyr Ala Asn Gly Gly Thr Asp Asn Asp Gly 
355 360 365 

Gly Lys Ala Thr Gin Thr Leu Ser Asn Gly Leu Asn Phe Lys Phe Lys 
370 375 380 

Ser Ser Asp Gly Glu Leu Leu Lys He Ser Ala Thr Gly Asp Thr Val 
385 390 395 400 

Thr Phe Thr Pro Lys Lys Gly Ser Val Gin Val Gly Asp Asp Gly Lys 
405 410 415 

Ala Ser He Ser Lys Gly Ala Asn Thr Thr Glu Gly Leu Val Glu Ala 
420 425 430 

Ser Glu Leu Val Glu Ser Leu Asn Lys Leu Gly Trp Lys Val Gly Val 
435 440 445 
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Glu ly , va! «, ~ 1, O- - »P «y ~ s« -ys «. W - 

450 455 
val Lys Ser Gly Asp Lys Val Thr Leu Lys Ala Gly Asp Asn Leu Xj. 
465 470 



val Lys Gin Glu Gly Thr Asn Phe Thr Tyr Ala Leu Lys Asp Glu I- 
485 

Ite « y va! W -r «1 «. ». JJJ » - - - £ » 



500 505 



Gly „a S« -y. Il« ~ g. «P «T - * II. ~ « - 

515 



» «y *X. » 1 «U T*r V,! *. *>P «• «J «- »« «- v.! 

530 535 
Ala Ser Asp Gly XI. Ser Ala Gly Asn Lys Ala Val Lys Asn Val £j 
545 550 5 

, <• »t=. Thr ser Thr Asp Ala He Asn Gly Ser Gin 
Ala Gly Glu He Ser Ala Thr Ser Tnr asp ^ 

565 

Tyr Ma val M. -ys «, '.1 Thr — — "» °£ ^ " 

580 585 

Mn _ OLU «, W s V.1 As- jr. v.! «V L-ys «. «• »P M. CXy 

Thr «. « >U >~ «a s« «n « « ■£ »• * «" « 

610 615 

,i = mv ser Ser Tyr Gin Gly Gin Asn 
Gly Lys Ser Met Val Ser He Ala Gly Ser Ser iy ^ 

625 630 

Gly Ala .1. «r val « »« ~ "» — " y ^ SI *" 

645 650 

t *er Glv Thr Thr Asn Ser Gin Gly Lys Thr Gly Val Ala 
lie Arg Leu Ser Gly mr 670 
660 665 



Ala Gly Val Gly Tyr Gin Trp 
675 

(2) INFORMATION FOR SEQ ID N0:16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 



(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 16: 



CCGTGCTTGC CCAACACGCT T 



21 



(2) INFORMATION FOR SEQ ID NO: 17: 



(i) 



SEQUENCE CHARACTERISTICS: 



{A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GCTGCCACCT TGCACAACAA C 21 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 18 : 
CTTTCAATGC CAGAAAGTAG G 21 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: unknown 

(D) TOPOLOGY: unknown 

(ii) MOLECULE TYPE: DNA (genomic) 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 



CTTCAACCGT TGCGGACAAC A 



21 
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CLAIMS 

We claim: 



1 



A recombinant Haemophilus adhesion protein. 



2. A recombinant Haemophilus adhesion protein according to claim 1 which h*s 
5 a sequence homologous to that shown in Figure 2. 

3 A recombinant Haemophilus adhesion protein according to claim 1 which has 
a sequence homologous to the amino acid sequence shown in Figure 3. 

4. A recombinant Haemophilus adhesion protein according to claim 1 which has 

the sequence shown in Figure 2. 

l0 5. ArecombinantH*^ 1 ***** 

the amino acid sequence shown in Figure 3. 

6. A recombinant nucleic acid encoding an Haemophilus adhesion protein. 

7. The nucleic acid of claim 6 comprising DNA having a sequence homologous to 
that shown in Figure 1 . 

,5 8. ThenuctcicacidofclaimecomprisingDNAhavingasequ.ncehomologous.o 

that shown in Figure 3. 

,. The nucleic acid of daim 6 comprising DNA capabte of hybridizing » *»< *own 
in Figure 1. 

r . • a ^«n«no DNA caoableofhvbridizing to that shown 
10. The nucleic acid of claim 6 comprising UNA capaoic . 

20 in Figure 3. 
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1 1 . The nucleic acid of claim 6 comprising DNA having the sequence shown in 
Figure 1. 

12. The nucleic acid of claim 6 comprising DNA having the sequence shown in 
Figure 3. 

5 13. An expression vector comprising transcriptional and translauor^ regulatory 
nucleic acid operably linked to nucleic acid encoding an Haemophilus adhesion 
protein. 

14. A host cell transformed with an expression vector comprising a nucleic acid 
encoding an Haemophilus adhesion protein. 

10 15. A method of producing an Haemophilus adhesion protein comprising: 

a) culturing a host cell transformed with an expressing vector comprising 
l nucleic acid encoding an Haemophilus adhesion protein: and 

b) expressingsaidnucleicacidtoproducean//aemo^i7«5adhesionproteia 



a) 
a i 



16. AvaccinecomprisingapharmaceuticallyaccepUblecarrierandan//flemo/»fii/w 
15 adhesion protein for prophylactic or therapeutic use in generating an immune 

response. 

1 7. A vaccine according to claim 16 wherein said Haemophilus adhesion protein 
has a sequence homologous to that shown in Figure 2. 

1 8. A vaccine according to claim 16 wherein said Haemophilus adhesion protein 
20 has a sequence homologous to the amino acid sequence shown in Figure 3. 



19. A monoclonalantibody capable of binding to an Haemophilus tdtesionpn 
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20. A method of treating or preventing Haemophilus influenzae infection comprising 
administering the vaccine of claim 16. 

21. A method of treating or preventing a Haemophilus influenzae infection according 
to claim 20 wherein said H influenzae infection is caused by a non-typable H 

5 influenzae. 
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ATGAACAAAA TTTTTAACGT TATTTGGAAT GTTGTGACTC AAACTTGGGT TGTCGTATCT 60 

GAACTCACTC GCACCCACAC CAAATGCGCC TCCGCCACCG TGGCGGTTGC CGTATTGGCA 120 

ACCCTGTTGT CCGCAACGGT TGAGGCGAAC AACAATACTC CTGTTACGAA TAAGTTGAAG 180 

GCTTATGGCG ATGCGAATTT TAATTTCACT AATAATTCGA TAGCAGATGC AGAAAAACAA 240 

GTTCAAGAGG CTTATAAAGG TTTATTAAAT CTAAATGAAA AAAATGCGAG TGATAAACTG 300 

TTGGTGGAGG ACAATACTGC GGCGACCGTA GGCAATTTGC GTAAATTGGG CTGGGTATTG 360 

TCTAGCAAAA ACGGCACAAG GAACGAGAAA AGCCAACAAG TCAAACATGC GGATGAAGTG 420 

TTGTTTGAAG GCAAAGGCGG TGTGCAGGTT ACTTCCACCT CTGAAAACGG CAAACACACC 480 

ATTACCTTTG CTTTAGCGAA AGACCTTGGT GTGAAAACTG CGACTGTGAG TGATACCTTA 540 

ACGATTGGCG GTGGTGCTGC TGCAGGTGCT ACAACAACAC CGAAAGTGAA TGTAACTAGT 600 

ACAACTGATG GCTTGAAGTT CGCTAAAGAT GCTGCGGGTG CTAATGGCGA TACTACGGTT 660 

CACTTGAATG GTATTGGTTC AACCTTGACA GACACGCTTG TGGGTTCTCC TGCTACTCAT 720 

ATTGACGGAG GAGATCAAAG TACGCATTAC ACTCGTGCAG CAAGTATCAA GGATGTCTTG 780 

AATGCGGGTT GGAATATCAA GGGTGTTAAA GCTGGCTCAA CAACTGGTCA ATCAGAAAAT 840 

GTCGATTTTG TTCATACTTA CGATACTGTT GAGTTCTTGA GTGCGGATAC AGAGACCACG 900 

ACTGTTACTG TAGATAGCAA AGAAAACGGT AAGAGAACCG AAGTTAAAAT CGGTGCGAAG 960 

ACTTCTGTTA TCAAAGAAAA AGACGGTAAG TTATTTACTG GAAAAGCTAA CAAAGAGACA 1020 

AATAAAGTTG ATGGTGCTAA CGCGACTGAA GATGCAGACG AAGGCAAAGG CTTAGTGACT 1080 

GCGAAAGATG TGATTGACGC AGTGAATAAG ACTGGTTGGA GAATTAAAAC AACCGATGCT 1140 

AATGGTCAAA ATGGCGACTT CGCAACTGTT GCATCAGGCA CAAATGTAAC CTTTGCTAGT 1200 

GGTAATGGTA CAACTGCGAC TGTAACTAAT GGCACCGATG GTATTACCGT TAAGTATGAT 1260 

GCGAAAGTTG GCGACGGCTT AAAACTAGAT GGCGATAAAA TCGCTGCAGA TACGACCGCA 1320 
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CTTACTGTGA ATGATGGTAA 


GAACGCTAAT AATCCGAAAG 


GTAAAGTGGC 


TGATGTTGCT 


1380 


TCAACTGACG AGAAGAAATT 


GGTTACAGCA AAAGGTTTAG 


TAACAGCCTT 


AAACAGTCTA 


1440 


AGCTGGACTA CAACT6CTGC 


TGAGGCGGAC 


GGTGGTACGC 


TTGATGGAAA TGCAAGTGAG 


1500 


CAAGAAGTTA AAGCGGGCGA 

waawaaua a x* aawwwww-wa 


TAAAGTAACC 


TTTAAAGCAG 


GCAAGAACTT 


AAAAGTGAAA 


1560 


PAAGAGGCPG PGAAPTTTAP 

WIAVXnUUU A \9 w\*AAw A X A Aw 


TTATTCACTG 


CAAGATGCTT 


TAACAGGCTT AACGAGCATT 


1620 


APTTTAGGTA PAGGAAATAA 

aw A A A AW A a wwuimn* AA 


TGGTGCGAAA ACTGAAATCA 


ACAAAGACGG 


CTTAACCATC 


1680 


ACACCAGCAA ATGGTGCGGG 

nWAvwlUwlfl A A wU A w WWW 


TGCAAATAAT 


GCAAACACCA 


TCAGCGTAAC 


CAAAGACGGC 


1740 


ATTAGTGCGG GCGGTCAGTC 

XXX inUiVJUVJU VWWXwAWXW 


GGTTAAAAAC 


GTTGTGAGCG 


GACTGAAGAA ATTTGGTGAT 


1800 


GCGAATTTCG ATCCGPTGAC 

WWwAAA A A WV? AibUWIWIW 


TAGCTCCGCC 


GACAACTTAA 


CGAAACAAAA 


TGACGATGCC 


1860 


iAlAAAWA* A IwAvWiAi A A 


GGATGAAAAA GGTACAGACA 


AGCAAACTCC 


AGTTGTTGCC 


1920 


UAUAA 1 a w wU w w UviuiwwU A 


GGGCGATTTG 


CGCGGCTTGG 


GCTGGGTCAT 


TTCTGCGGAC 




AAAAPPAPAR RPnCPTPAAC 

AAAAvLAwAU \»w\Jvlw A wAAw 


GGAATATCAC 


GATCAAGTTC 


GGAATGCGAA 


CGAAGTGAAA 


2040 


TTPAAAAGPG GPAAPGGTAT 


CAATGTTTCC 


GGTAAAACGG 


TCAACGGTAG 


GCGTGAAATT 


2100 

a a. w w 


ACTTTTGAAT TGGCTAAAGG 

Aw A X X X WX A A WW A^*#*A>ww 


TGAAGTGGTT 


AAATCGAATG 


AATTTACCGT 


CAAAGAAACC 


2160 


AATGGAAAGG AAACGAGCCT 


GGTTAAAGTT 


GGCGATAAAT 


ATTACAGCAA 


AGAGGATATT 


2220 


GACTTAACAA CAGGTCAGCC 


TAAATTAAAA 


GATGGCAATA 


CAGTTGCTGC 


GAAATATCAA 


2280 


GATAAAGGTG GCAAAGTCGT 


TTCTGTAACG 


GATAATACTG 


AAGCTACCAT 


AACCAACAAA 


2340 


GGTTCTGGCT ATGTAACAGG 


TAACCAAGTG 


GCAGATGCGA 


TTGCGAAATC 


AGGCTTTGAG 


2400 


CTTGGCTTGG CTGATGAAGC 


TGATGCGAAA 


CGGGCGTTTG 


ATGATAAGAC 


AAAAGCCTTA 


2460 


TCTGCTGGTA CAACGGAAAT 


TGTAAATGCC 


CACGATAAAG 


TCCGTTTTGC 


TAATGGTTTA 


2520 


AATACCAAAG TGAGCGCGGC 


AACGGTGGAA AGCACCGATG 


CAAACGGCGA 


TAAAGTGACC 


2580 


ACAACCTTTG TGAAAACCGA 


TGTGGAATTG 


CCTTTAACGC 


AAATCTACAA 


TACCGATGCA 


2640 
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AAATCACTAA 

nnnxwnw xnn 


AGTTGTCAAA GATGGGCAAA 


CTAAATGGTA TGAACTGAAT 


2700 


ww A VIAVww X A 


CGGCTGATAT 

WWWW X 


GACCAAAGAA GTTACCCTCG 

wiwwfinnwin * »«»www * »*w 


GTAACGTGGA TTCAGACGGC 


2760 


ftftVtftftftwX X W 


TGAAAGACAA 


CGATGGCAAG TGGTATCACG 


CCAAAGCTGA CGGTACTGCG 


2820 


GATAAAACCA 


AAGGCGAAGT 


GAGCAATGAT AAAGTTTCTA 


CCGATGAAAA ACACGTTGTC 


2880 


AGCCTTGATC 


CAAATGATCA 


ATCAAAAGGT AAAGGTGTCG 


TGATTGACAA TGTGGCTAAT 


2940 


GGCGATATTT 


CTGCCACTTC 


CACCGATGCG ATTAACGGAA 


GTCAGTTGTA TGCTGTGGCA 


3000 


AAAGGGGTAA 


CAAACCTTGC 


TGGACAAGTG AATAATCTTG 


AGGGCAAAGT GAATAAAGTG 


3060 


GGCAAACGTG 


CAGATGCAGG 


TACAGCAAGT GCATTAGCGG 


CTTCACAGTT ACCACAAGCC 


3120 


ACTATGCCAG 


GTAAATCAAT 


GGTTGCTATT GCGGGAAGTA 


GTTATCAAGG TCAAAATGGT 


3180 


TTA6CTATCG 


GGGTATCAAG 


AATTTCCGAT AATGGCAAAG 


TGATTATTCG CTTGTCAGGC 


3240 


ACAACCAATA 


GTCAAGGTAA 


AACAGGCGTT GCAGCAGGTG 


TTGGTTACCA GTGG 


3294 
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Met Asn Lye lie Phe Asn Val lie Trp Asn Val Val Thr Gin Thr Trp 
1 5 10 15 

Val Val Val Ser Glu Leu Thr Arg Thr His Thr Lys Cys Ala Ser Ala 
20 25 30 

Thr Val Ala Val Ala Val Leu Ala Thr Leu Leu Ser Ala Thr Val Glu 
35 40 45 

Ala Asn Asn Asn Thr Pro Val Thr Asn Lys Leu Lys Ala Tyr Gly Asp 
50 55 60 

Ala Asn Phe Asn Phe Thr Asn Asn Ser He Ala Asp Ala Glu Lys Gin 
65 70 75 80 

val Gin Glu Ala Tyr Lys Gly Leu Leu Asn Leu Asn Glu Lys Asn Ala 
85 90 95 

Ser Asp Lys Leu Leu Val Glu Asp Asn Thr Ala Ala Thr Val Gly Asn 
100 105 HO 

Leu Arg Lys Leu Gly Trp Val Leu Ser Ser Lye Asn Gly Thr Arg Asn 
115 120 125 

Glu Lys Ser Gin Gin Val Lys His Ala Asp Glu Val Leu Phe Glu Gly 
130 135 140 

Lys Gly Gly Val Gin Val Thr Ser Thr Ser Glu Asn Gly Lys His Thr 
145 150 155 160 

He Thr Phe Ala Leu Ala Lys Asp Leu Gly Val Lys Thr Ala Thr Val 
165 I 70 175 

Ser Asp Thr Leu Thr He Gly Gly Gly Ala Ala Ala Gly Ala Thr Thr 
180 185 190 

Thr Pro Lys Val Asn Val Thr Ser Thr Thr Asp Gly Leu Lys Phe Ala 
195 200 205 

Lys Asp Ala Ala Gly Ala Asn Gly Asp Thr Thr Val His Leu Asn Gly 
210 215 220 

He Gly Ser Thr Leu Thr Asp Thr Leu Val Gly Ser Pro Ala Thr His 

225 230 

He Asp Gly Gly Asp Gin Ser Thr His Tyr Thr Arg Ala Ala Ser He 

Lys Asp val Leu Asn Ala Gly Trp Asn He Lys Gly Val Lys Ala Gly 
260 265 270 

Ser Thr Thr Gly Gin Ser Glu Asn Val Asp Phe Val His Thr Tyr Asp 
275 280 285 
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Thr Val Glu Phe Leu Ser Ala Asp Thr Glu Thr Thr Thr Val Thr Val 
290 295 300 

Asp Ser Lys Glu Asn Gly Lys Arg Thr Glu Val Lys lie Gly Ala LyB 
305 310 315 320 

Thr Ser Val lie Lys Glu Lys Asp Gly Lys Leu Phe Thr Gly Lys Ala 
325 330 335 

Asn Lys Glu Thr Asn Lys Val Asp Gly Ala Asn Ala Thr Glu Asp Ala 
340 345 350 

Asp Glu Gly Lys Gly Leu Val Thr Ala Lys Asp Val He Asp Ala Val 
355 360 365 

Asn Lys Thr Gly Trp Arg He Lys Thr Thr Asp Ala Asn Gly Gin Asn 
370 375 380 

Gly Asp Phe Ala Thr Val Ala Ser Gly Thr Asn Val Thr Phe Ala Ser 
385 390 395 400 

Gly Asn Gly Thr Thr Ala Thr Val Thr Asn Gly Thr Asp Gly He Thr 
405 410 415 

Val Lys Tyr Asp Ala Lys Val Gly Asp Gly Leu Lys Leu Asp Gly Asp 
420 425 430 

Lys He Ala Ala Asp Thr Thr Ala Leu Thr Val Asn Asp Gly LyB Asn 
435 440 445 

Ala Asn Asn Pro Lys Gly Lys Val Ala Asp Val Ala Ser Thr Asp Glu 
450 455 460 

Lys Lys Leu Val Thr Ala Lys Gly Leu Val Thr Ala Leu Asn Ser Leu 
465 470 475 480 

Ser Trp Thr Thr Thr Ala Ala Glu Ala Asp Gly Gly Thr Leu Asp Gly 
485 490 495 

Asn Ala Ser Glu Gin Glu Val Lys Ala Gly Asp Lys Val Thr Phe Lys 
500 505 510 

Ala Gly Lys Asn Leu Lys Val Lys Gin Glu Gly Ala Asn Phe Thr Tyr 
515 520 525 

Ser Leu Gin Asp Ala Leu Thr Gly Leu Thr Ser He Thr Leu Gly Thr 
530 535 540 

Gly Asn Asn Gly Ala Lys Thr Glu He Asn Lys Asp Gly Leu Thr lie 
545 550 555 560 

Thr Pro Ala Asn Gly Ala Gly Ala Asn Asn Ala Asn Thr He Ser Val 

565 570 575 

Thr Lys Asp Gly He Ser Ala Gly Gly Gin Ser Val Lys Asn Val Val 
580 585 590 
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Ser Gly Leu Lys Lys Phe Gly Asp Ala Asn Phe Asp Pro Leu Thr Ser 
595 600 605 

Ser Ala Asp Asn Leu Thr Lys Gin Asn Asp Asp Ala Tyr Lys Gly Leu 
610 615 620 

Thr Asn Leu Asp Glu Lys Gly Thr Asp LyB Gin Thr Pro Val Val Ala 
625 630 635 640 

Asp Asn Thr Ala Ala Thr Val Gly Asp Leu Arg Gly Leu Gly Trp Val 
645 650 655 

lie Ser Ala Asp Lys Thr Thr Gly Gly Ser Thr Glu Tyr His Asp Gin 
660 665 670 

Val Arg Asn Ala Asn Glu Val Lys Phe Lys Ser Gly Asn Gly He Asn 
675 680 685 

Val Ser Gly Lys Thr Val Asn Gly Arg Arg Glu He Thr Phe Glu Leu 
690 695 700 

Ala Lys Gly Glu Val Val Lys Ser Asn Glu Phe Thr Val Lys Glu Thr 
705 710 715 720 

Asn Gly Lys Glu Thr Ser Leu Val Lys Val Gly Asp Lys Tyr Tyr Ser 
725 730 735 

Lys Glu Asp He Asp Leu Thr Thr Gly Gin Pro Lys Leu Lys Asp Gly 
740 745 750 

Asn Thr Val Ala Ala Lys Tyr Gin Asp Lys Gly Gly Lys Val Val Ser 
755 760 765 

Val Thr Asp Asn Thr Glu Ala Thr He Thr Asn Lys Gly Ser Gly Tyr 
770 775 780 

Val Thr Gly Asn Gin Val Ala Asp Ala He Ala Lys Ser Gly Phe Glu 
785 790 795 800 

Leu Gly Leu Ala Asp Glu Ala Asp Ala Lys Arg Ala Phe Asp Asp Lys 
805 810 815 

Thr Lys Ala Leu Ser Ala Gly Thr Thr Glu He Val Asn Ala His Asp 
620 825 830 

Lys Val Arg Phe Ala Asn Gly Leu Asn Thr Lys Val Ser Ala Ala Thr 
835 840 845 

Val Glu Ser Thr Asp Ala Asn Gly Asp Lys Val Thr Thr Thr Phe Val 
850 855 860 

Lys Thr Asp Val Glu Leu Pro Leu Thr Gin He Tyr Asn Thr Asp Ala 
865 870 875 880 

Asn Gly Lys Lys He Thr Lys . Val Val Lys Asp Gly Gin Thr Lys Trp 

885 890 895 
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Tyr Glu Leu Asn Ala Asp Gly Thr Ala Asp Met Thr Lys Glu Val Thr 
900 905 910 

Leu Gly Asn Val Asp Ser Asp Gly Lys Lys Val Val Lys Asp Asn Asp 
915 920 925 

Gly Lys Trp Tyr His Ala Lys Ala Asp Gly Thr Ala Asp Lys Thr Lys 
930 935 940 

Gly Glu Val Ser Asn Asp Lys Val Ser Thr Asp Glu Lys His Val Val 
945 950 955 960 

Ser Leu Asp Pro Asn Asp Gin Ser Lys Gly Lys Gly Val Val lie Asp 
965 970 975 

Asn Val Ala Asn Gly Asp lie Ser Ala Thr Ser Thr Asp Ala lie Asn 
980 985 990 

Gly Ser Gin Leu Tyr Ala Val Ala Lys Gly Val Thr Asn Leu Ala Gly 
995 1000 1005 

Gin Val Asn Asn Leu Glu Gly Lys Val Asn Lys Val Gly Lys Arg Ala 
1010 1015 1020 

Asp Ala Gly Thr Ala Ser Ala Leu Ala Ala Ser Gin Leu Pro Gin Ala 
1025 1030 1035 1040 

Thr Met Pro Gly Lys Ser Met Val Ala He Ala Gly Ser Ser Tyr Gin 
1045 1050 1055 

Gly Gin Asn Gly Leu Ala He Gly Val Ser Arg lie Ser Asp Asn Gly 
1060 1065 1070 

Lys Val He He Arg Leu Ser Gly Thr Thr Asn Ser Gin Gly Lys Thr 
1075 1080 1085 

Gly Val Ala Ala Gly Val Gly Tyr Gin Trp 
1090 1095 
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1 TTTOTTTTTCTOAT MW1T1W €0 

• 

61 AAATATCACTTTTTTATTCTCCAAATATAGAATAGAATACGCACGATTTCACTAAGAAAA 120 



121 GTATATTTATCATTAATTTTATTAAATATAAGCTA^TAAAAATGAACAAAATTTTTAAC 180 

M N K I F N 

♦ 

181 GTTATTTGGAAT6TTATQACTCAAACTT6GGTTGTC6TATCTGAACTCACTCGCACCCAC 240 
VIWNVMTQTWVVVSELTRTH 

241 ACCAAACGCGCCTCCGCAACCGTGGAGACCGCCGTATTGGCGACACTGTTGTTTGCAACG 300 
TKRASATVETAVLATLLFAT 

301 GTTCAGGCGAATGCTACCGATGAAGATGAAGAGTTAGACCCCGTAGTACGCACTGCTCCC 360 
VQANATDEDEELDPVVR TAP 

361 GTGTTGAGCTTCCATTCCGATAAAGAAGGCACGGGAGAAAAAGAAGTTACAGAAAATTCA 420 
VLSFHSDKEGTGEKEVTENS 

421 AATTGGGGAATATATTTCGACAATAAAGGAGTACTAAAAGCCGGAGCAATCACCCTCAAA 480 
NWGIYFDNKGVLRAGAITLK 

481 GCCGGCGACAACCTGAAAATCAAACAAAACACCGATGAAAGCACCAATGCCAGTAGCTTC 540 
AGDNLKIKQNTDESTNASSF 

541 ACCTACTCGCTGAAAAAAGACCTCACAGATCTGACCAGTGTTGCAACTGAAAAATTATCG 600 
TYSLKKDLTDLTSVATERLS 

601 TTTGGCGCAAACGGCGATAAAGTTGATATTACCAGTGATGCAAATGGCTTGAAATTGGCG 660 
FGANGDKVDITSDANGLKLA 

661 AAAACAGGTAACGGAAATGTTCATTTGAATGGTTTGGATTCAACTTTGCCTGATGCGGTA 720 
KT GNGNVHLNGLDSTLPDAV 

721 ACGAATACAGGTGTGTTAAGTTCATCAAGTTTTACACCTAATGATGTTGAAAAAACAAGA 780 
TNTGVLSSSSFTPNDVERTR 

781 GCTGCAACTGTTAAAGATGTTTTAAATGCAGGTTGGAACATTAAAGGTGCTAAAACTGCT 840 
AATVKDVL NAGW NIRGAKTA 

841 GGAGGTAATGTTGAGAGTGTTGATTTAGTGTCCGCTTATAATAATGTTGAATTTATTACA 900 
GGNVESVDLVSAYNNVEFIT 

901 GGCGATAAAAACACGCTTGATGTTGTATTAACAGCTAAAGAAAACGGTAAAACAACCGAA 960 
GDK NTLDVVLTAKENGRTTE 

961 GTGAAATTCACACCGAAAACCTCTGTTATCAAAGAAAAAGACGGTAAGTTATTTACTGGA 1020 
VKFTPKTSVIKEKDGKLFTG 

1021 AAAGAGAATAACGACACAAATAAAGTTACAAGTAACACGGCGACTGATAATACAGATGAG 1080 
K E NNDTNKVTSNTATDNTDE 

1081 GGTAATGGCTOAGTCACTGCAAAAGCTGTGATTGATGCTGTGAACAAGGCTGGTTGGAGA 1140 
GNGLVTAKAVIDAVNKAGWR 
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1141 GTTAAAACAACTACTGCTAATGGTCAAAATGGCGACTO 1200 
VKTTTANGQNGDFATVASGT 

» • * • • • 

12 0 1 AATGTAACCTTTGAAAGTGGCGATGGTACAACAGCGTCAGTAACTAAAGATACTAACGGC 1260 

NVTFESGDGTTASVTKDTNG 

» • • • • • 

1261 AATGGCATCACTGTTAAGTACGACGCGAAAGTTGGCGACGGCTTGAAATTTGATAGCGAT 1320 
NGITVKYDAKVGDGLKFDSD 

» • • • • • 

1321 AAAAAAATCGTTGCAGATACGACCGCACTTACTGTGACAGGTGGTAAGGTAGCTGAAATT 1380 
KKIVADTTALTVTGGKVAEI 

. • • • » • 

13 8 1 GCTAAAGAAGATGACAAGAAAAAACTTGTTAATGCAGGCGATTTGGTAACAGCTTTAGGT 144 0 

AKEDDKKKLVNAGDL V T A L G 

144 1 AATCTAAGTTGGAAAGCAAAAGCTGAGGCTGATACTGATGGTGCGCTTGAGGGGATOT 1500 
NLSWKAKAEADTDGALEGIS 

• • • • 

1501 AAAGACCAAGAAGTCAAAGCAGGCGAAACGGTAACCTTTAAAGCGGGCAAGAACTTAAAA 1560 
KDQEVKAGET VTFKA GKNLK 

1561 GTGAAACAGGATGGTGCGAACTTTACTTATTCACTGCAAGATGCTTTAACGGGTTTAACG 1620 
VKQDGANFTYSLQDALTGLT 

1621 AGCATTACTTTAGGTGGTACAACTAATGGCGGAAATGATGCGAAAACCGTCATCAACAAA 1680 
SITLGGTTNGGNDAKT VINK 

• ••••• 

1681 GACGGTTTAACCATCACGCCAGCAGGTAATGGCGGTACGACAGGTACAAACACCATCAGC 1740 
DGLTITPAGNGGTTGTNTIS 

1741 GTAACCAAAGATGGCATTAAAGCAGGTAATAAAGCTATTACTAATGTTGCGAGTGGTTTA 1800 
VTKDGIKAGNKAITNVASGL 

..*••• 
1801 AGAGCTTATGACGATGCGAATTTTGATGTTTTAAATAACTCTGCAACTGATTTAAATAC5A 1860 
RAYDDANFDVLNNSATDLNR 

1861 CACGTTGAAGATGCTTATAAAGGTTTATTAAATCTAAATGAAAAAAATGCAAATAAACAA 192 0 
HVEDA YKG LLNLNEKNANKQ 

1921 CCGTTGGTGACTGACAGCACGGCGGCGACTGTAGGCGATTTACGTAAATTGGGTTGGGTA 1980 
PLVTDSTAATVGDLRKLGWV 

1981 GTATCAACCAAAAACGGTACGAAAGAAGAAAGCAATCAAGTTAAACAAGCTGATGAAGTC 2040 
VSTKNGTKEESNQVKQADEV 

2041 CTCTTTACCGGAGCCGGTGCTGCTACGGTTACTTCCAAATCTGAAAACGGTAAACATACG 2100 
LFTGAGAATVTSKSENGKHT 

2101 ATTACCGTTAGTGTGGCTGAAACTAAAGCGGATTGCGGTCTTGAAAAAGATGGCGATACT 2160 
ITVSVAETKADCGLEKDGDT 

2161 ATTAAGCTCAAAGTGGATAATCAAAACACTGATAATGTTTTAACTGTTGGTAATAATGGT 2220 
IKLKVDNQNTDNVLTVGNNG 

222 1 ACTGCTGTCACTAAAGGTGGCTTTGAAACTGTTAAAACTGGAGCGACTGATGCAGATCGC 2280 
TAVTKGGFETVKTGATDADR 
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2281 


GGTAAAGTAACTGTAAAAGATGCTACTGCTAATGACQCTGATAAGAAAGTCGCAACTGTA 
GKVTVKD ATANDADKKVA TV 


2340 


2341 


AAAGATGTTGCAACCGCAATTAATAGTGCGGCGACTTTTGTGAAAACAGAGAATTTAACT 
KDVATAINSAATFVKTENLT 


2400 


2401 


ACCTCTATTGATGAAGATAATCCTACAGATAACGGCA^ 

TSIDEDNPTDNGKDDALKAG 

^ mm mmf m* • 


2460 


2461 


GATACCTTAACCTTTAAAGCAG6TAAAAACCTGAAA6TTAAACGTC^T6GAAAAAATATT 
DTLT PKAGKHLKVKRDGKHI 


2520 


2521 


ACTTTTGACTTGGCGAAAAACCTTGAGGTGAAAACTC 

TPDLAKNLEVKTAKVSDTLT 
x r xj *» * 


2580 


2581 


ATTGGCGGGAATACACCTAC^GGTGGCACTACTGCGACGCCyU^AAGTGAATATTACTArc 
TGGNTPTGGTTATPKVNITS 


2640 


2641 


ACGGCTGATGGTTTGAATTTTGCAAAAGAAACAGCCGATGCCTCGGGTTCTAAGAATO 
•PADGLNFAKETADASGSKNV 

X #\ xv w xj x» • •* ** ** 


2700 


2701 


TATTTGAAAGGTATTGCGACAACTTTAACTGAGCCAAGCGCGGGAGCGAAGTC 
VT. KGIATTLTEPSAGAKSSH 


2760 


2761 


GTTGATTTAAATGTGGATGCGACGAAAAAATCCAATGCAGCAAGTATTGAAGATGTATTG 
VDLNVDAT KK SNAAS I EDVL 


2820 


2B21 


CGCGCAGGTTCK5AATATTCAAGGTAATGGTAATAATGTTGATTATGTAGCGACGTATGAC 
PAGWNIOGNGNNVDYVATYD 


2880 


2881 


ACAGTAAACTTTACCGATGACAGCACAGGTACAACAACGGTAACCGTAACCCAAAAAGCA 
TVKFTDDSTGTTTVTVTQKA 

X V XtI XT X mJ mJ " 


2940 


2941 


GATGGCAAAGGTGCimCGTTAAAATCGGTGCGAAAACTTC 

nRK GADVKIGAKTSVIKDHN 


3000 


3001 


GGCAAACTGTTTACAGGCAAAGACCTGAAAGATGCGAATAATGGTGCAACCGTTAGTGAA 
CKLPTGK DLKDAMNGATVSE 


3060 


3061 


GATGATGGCAAAGACACCGGCACAGGCTTAGTTACTGCAAAAACTGTGATTGATGCAGTA 
DDGKDTGTGLVTAKTVIDAV 


3120 


3121 


AATAAAAGCGGTTGGAGGGTAACCGGTGAGGGCGCGACTGCCGAAACCGGTGCAACCGCC 
NKSGWRVTGEGATAETGATA 


3180 


3181 


GTGAATGCGGGTAACGCTGAAACCGTTACATCAGGCACGAGCGTGAACTTCAAAAACGGC 
t»m & oma p-TVTSGTSVNFKNG 


3240 


3241 


AATGCGACCACAGCGACCGTAAGCAAAGATAATGGCAACATCAATGTCAAATACGATGTA 
NATTATVSKDNGNINVKYDV 


3300 


3301 


• • * ' 

AATGTTGGTGACGGCTTGAAGATTGGCGATGACAAAAAAATCGTTGCAGACACGACCACA 

nvgdglkigddkkivadttt 


3360 


3361 


CTTACTGTAACAGGTGGTAAGGTGTCTGTTCCTGCTGGTGCTAATAGTGTTAATAACAAT 
LTVTGGKVSVPAGANSVNNN 


3420 
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• • • • • • 

3421 AAGAAACTTGTTAAT6CA6AG66TTTA6CGACT6CTTTAA&CAACCTAAGCT6GAC66^ 3480 
KKLVNAEGLATALNNLSWTA 

3481 AAAGCCGATAAATATGCAGATGGCGAGTCAGAGGGCGAAACCGACCAAGAAGTCAAAGCA 3540 
KADKYADGESEGETD QEVKA 

3541 GGCGACAAAGTAACCTTTAAAGCAGGCAAGAACTTAAAAGTGAAACAGTCTGAAAAAGAC 3600 
GDKVTPKAGKNLKVKQSEKD 

• • • • • • 

3601 TTTACTTATTCACTGCAAGACACTTTAACAGGCTTAACGAGCA 3660 
FTYSLQDTLTGLTSITLGGT 

• • • • • • 

3661 GCTAATGGCAGAAATGATACGGGAACCGTCATCAACAAAGACGGCTTAACCATCACGCTG 3720 
ANGRNDTGTVINKDGLTITL 

3721 GCAAATGGTGCTGCGGCAGGCACAGATGCGTCTAACGGAAACACCATCAGTGTAA 3780 
ANGAAAGTDASNGNTI SVTK 

3781 GACGGCATTAGTGCGGGTAATAAAGAAATTACC^TGTTAAGAGTGCTTO 3840 
DGISAGNKEITNVK SALKTY 

3841 AAAGATACTCAAAACACTGCAGATGAAACACAAGATAAAGAGTTCCACGCCGCCGTTAAA 3900 
KDTQNTADETQDKEFHAAVK 

• • • • * • 

3901 AACGCAAATGAAGTTGAGTTCGTGGGTAAAAACGGTGCAACCGTGTCTGCAAAAACTGAT 3960 
NANEV EFVGKNGATVSAKTD 

3961 AACAACGGAAAACATACTGTAACGATTGATGTTGCAGAAGCCAAAGTTGGTGATGGTCTT 4020 
NNGKHTVTIDVAEAKVGDGL 

• ••••• 

4021 GAAAAAGATACTGACGGCAAGATTAAACTCAAAGTAGATAATACAGATGGGAATAATCTA 4080 
EKDTDGKIKLKVDNTDGNNL 

4081 TTAACCGTTGATGCAACAAAAGGTGCATCCGTTGCCAAGGGCGAGTTTAATGCCGTAACA 4140 
LTVDATKGASVAKGEFNAVT 

4141 AGAGATGCAACTACAGCCCAAGGCACAAATGCCAATGAGCGCGGTAAAGTGGTTGT 4200 
TDA TTAQGTNANERGKVVVK 

• ••••• 

4201 GGTTCAAATGGTGCAACTGCTACCGAAACTGACAAGAAAAAAGTGGCAACTGTTGGCGAC 4260 
GSNGATATBTDKKKVATVGD 

• ■ • • • • 

4261 GTTGCTAAAGCGATTAACGACGCAGCAACTTTCGTGAAAGTGGAAAATGACGACAGTGCT 4320 
VAKAINDAATFVKVENDDSA 

• • • » » • 

4321 ACGATTGATGATAGCCCAACAGATGATGGCGCAAATGATGCTCTCAAAGCAGGCGACACC 4380 
TIDDSPTDDGANDALKAGDT 

4381 TTGACCTTAAAAGCGGGTAAAAACOTAAAAGTTAAACGTGATGGTAAAAATATTACTTTT 4440 
LTLKAGKNLKVKRDGKNITF 

4441 GCCCTTGCGAACGACCTTAGTGTAAAAAGCGCAACCGTTAGCGATAAATTATCGCTTGGT 4500 
ALANDL SVKSATVSDKLSLG 

4501 ACAAACGGCAATAAAGTCAATATCACAAGCGACACCAAAGGCTTGAACTTCGCTAAAGAT 4560 
TNGNKVNITSDTKGLNFAKD 
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4561 ACTAAGACAGGCGATGATGCTAATATTCACTT^ 4620 
SKTGDDANIHLNGIASTLTD 

• •••*• 

4621 ACATTGTTAAATA6T66T6CGACAACCAATTTA6GTG6TAATG6TATTACTGATAACGA6 4680 
TLLNSGATTNLGGNGITDNE 

• ' * * • * • 

4681 AAAAAACGCGCGGCGAGCGTTAAAGATGTCTTGAATGCG66TT6GAATGTTCGT6GTGTT 4740 
KKRAASVKDVLNAGWNVRGV 

w • » • • ♦ 

4741 AAACCGGCATCTGCAAATAATCAAGTGGAGAATATCGACTTTGTAGCAACCTACGACACA 4800 
KPASANNQVENIDFVATYDT 

• • • * • • 

4801 GTGGACTTTGTTAGTGGAGATAAAGACACCACGAGTGTAA 4860 
VDFVSGDKDTTSVTVESKDN 

4861 GGCAAGAGAACCGAAGTTAAAATCGGTGCGAAGACTTCTGTTATCAAAGACCACAACGGC 4920 
GKRTEVKIG AKTSVIKDHNG 

» • • # • ♦ 

4921 AAACTGTTTACAGGCAAAGAGCTGAAGGATGCTAACAATAATGGCGTAACTGTTACCGAA 4980 
KLFTGKELKDANNNGVTVTE 

• «•••» 

4981 ACCGACGGCAAAGACGAGGGTAATGGTTTAGTGACTGCAAAAGCTGTGATTGATGCCGTG 5040 
TDGKDEGNG LVTAKAVIDAV 

• • • • • • 

5041 AATAAGGCTGGTTGGAGAGTTAAAACAACAGGTGCTAATGGTCAGAATGATGACTTCGCA 5100 
NKAGWRVKTTGANGQNDDFA 

»••••• 

5101 ACTGTTGCGTCAGGCACAAATGTAACCTTTGCTGATGGTAATGGCACAACTGCCGAAGTA 5160 
TVASGTNVTFADGNGTTAEV 

• ••*»• 

5161 ACTAAAGCAAACGACGGTAGTATTACTGTTAAATACAATGTTAAAGTGGCTGATGGCTTA 5220 
TKANDGS I TVKYNVKVADGL 

»*«••• 
5221 AAACTAGACGGCGATAAAATCGTTGCAGACACGACCGTACTTACTGTGGCAGATGGTAAA 5280 
KLDGDKIVADTTVLTVADGK 

» • • * ♦ • 

5281 GTTACAGCTCCGAATAATGGCGATGGTAAGAAATTTGTTGATGCAAGTGGTTTAGCGGAT 5340 
V T A P N N G D G K K F V DA S G LA D 

• • • • • • 

5341 GCGTTAAATAAATTAAGCTGGACGGCAACTGCTGGTAAAGAAGGCACTGGTGAAGTTGAT 5400 
ALNKLSWTATAGKBGTGEVD 

, • • • • • 

5401 CCTGCAAATTCAGCAGGGCAAGAAGTCAAAGCGGGCGACAAAGTAACCTTTAAAGCCGGC 5460 
PANSAGQEVKAGDKVTFKAG 

»••••■ 
5461 GACAACCTGAAAATCAAACAAAGCGGCAAAGACTTTACCTACTCGCTGAAAAAAGAGCTG 5520 
DNLKIKQSGKDFTYSLKKEL 

• • • . * * • 

5521 AAAGACCTGACCAGCGTAGAGTTCAAAGACGCAAACGGCGGTACAGGCAGTGAAAGCACC 5580 
KDLTSVEFKDANGGTGSEST 

. # * • • • 

5581 AAGATTACCAAAGACGGCTTGACCATTACGCCGGCAAACGGTGCGGGTGCGGCAGGTGCA 5640 
KITKDGLT ITPANGAGAAGA 

5641 AACACTGCAAACACCATTAGCGTAACCAAAGATGGCATTAGCGCGGGTAATAAAGCAGTT 5700 
NTANTISVTKDGISAGNKAV 
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• •••»• 

5701 ACAAACGTTGTGAGCGGACTGAAGAAATTTGGTGATGGTCATA^ 5760 
TNVVSGLKKFGDGHTLANGT 

• ••••• 

5761 GTTGCTGATTTTGAAAAGCATTATGACAATGCCTATAAA^ 5820 
VADFEKHYDNAYKDLTNLDE 

5821 AAAGGCGCGGATAATAATCCGACTGTTGCCGACAATACCGCTGCAACCGTOGGCGATTTG 5880 
K6ADNNPTVADNTAATVGDL 

• • • ■ • • 

5881 CGCGGCTTGGGCTGGGTCATTTCTGCGGAOUUU^ 5940 
RGLGWVI SADKTTGEPNQEY 

• * • • * • 

5941 AACGCGCAAGTGCGTAACGCCAATGAAGTGAAATTCAAGAGCGGCAACGGTATCAA 6000 
NAQVRNANEVKFKSGNGINV 

• • • • » » 

6001 TCCGGTAAAACATTGAACGGTACGCGCGTGATTACCTTTGAATTGGCTAAAGGCGAAGTG 6060 
SGKTLNGTRVITFELAKGEV 

• • • • * • ■ 
6061 GTTAAATCGAATGAATTTACCGTTAAGAATGCCGATGGTTCGGAAACGAACTTGGTTA^ 6120 

V K S N E F T V K N A D G S E T N L V K 

• • » » • • 

6121 GTTGGCGATATGTATTACAGCAAAGAGGATATTGACCCGGCAACCAGTAAACCGATGACA 6180 
VGDMYYSKEDIDPATSKPMT 

6181 GGTAAAACTGAAAAATATAAGGTTGAAAACGGCAAAGTCGTTTCT 6240 
GKTEKYKVENGKVVSANGSK 

624 1 ACCGAAGTTACCCTAACCAACAAAGGTTCCGGCTATGTAACAGGTAACCAAGTGGCTGAT 6300 
TEVTLTNKGSGYVTGNQVAD 

• ••••• 

6301 GCGATTGCGAAATCAGGCTTTGAGCTTGGTTTGGCTGATGCGGCAGAAGCTGAAAAAGCC 6360 
AIAKSGFELGLADAAEAEKA 

• * • • • • 

6361 TTTGCAGAAAGCGCAAAAGACAAGCAATTGTCTAAAGATAAAGCG 6420 
FAES AKDKQLSKDKAETVNA 

• • • * * • 

642 1 CACGATAAAGTCCGTTTTGCTAATGGTTTAAATACCAAAGTGAGCGCGGCAACGGTGGAA 6480 
H DRV R F A N G LNT KV S A A TVE 

6481 AGCACTGATGCAAACGGCGATAAAGTGACCACAACCTTTGTGAAAACCGATGTGGAATTG 6540 
STDANGDKVTTTFVKTDVEL 

• • » • • • 

6541 CCTTTAACGCAAATCTACAATACCGATGCAAACGGTAATAAGATCGTTAAAAAAGCTGAC 6600 
PLTQIYNTDANGNKIVKKAD 

6601 GGAAAATGGTATGAACTGAATGCTGATGGTACGGCGAGTAACAAAGAAGTGACACTTGGT 6660 
GKWYELNADGTASNKEVTLG 

» • • • • • 

6661 AACGTGGATGCAAACGGTAAGAAAGTTGTGAAAGTAACCGAAAATGGTGCGGATAAGTGG 6720 
NVDANGKKVVKVTENGADKW 

• »•••• 

6721 TATTACACCAATGCTGACGGTGCTGCGGATAAAACCAAAGGCGAAGTGAGCAATGATAAA 6780 
YYTNADGAADKTKGEVSNDK 

• • • * • • 

6781 GTTTCTACCGATGAAAAACACGTTGTCCGCCTTGATCCGAACAATCAATCGAACGGCAA^ 6840 
VSTDEKHVVRLDPNNQSNGK 
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6841 6GCGT66TCATTGACAATGTGGCTAAT6GC6AAATTTCTGCCACTTCCACCGAT6CGA^ 6900 
GVVIDNVANGBISATSTDAI 

6901 AACGGAAGTCAGTTGTATGCCGTGGCAAAAGGGGTAACAAACCTTGCTGGACAAGTGAAT 6960 
NGSQLYAVAKGVTNLAGQVN 

6961 AATCTTGAGGGCAAAGTGAATAAAGTGGGCAAACGTGCAGATGCAGGTACAGCAAGTGCA 7020 
NLEGKVNKVGKRADAGTASA 

7021 CTAGCGGCTTCACAGTOACCACAAGCCACTATGCCAGGTAAATCAATGG 7080 
LAASQLPQA TMPGKSMVAIA 

7081 GGAAGTAGTTATCAAGGTCAAAATGGTTTAGCTATCGGGGTATCAAGAATTTCCGATAAT 7140 
GSSYQGQNGLAIGVSRISDN 

7141 GGCAAAGTGATTATTCGCTTGTCAGGCACAACCAATAGTCAAGGTAAAACAGGCGTTGCA 7200 
GKVIIRLSGTTNSQGKTGVA 

7201 GC A(rCT r rT' PMq1111 & r*P a fSTf^TAAAGTTTGGATTATCTCTCTT AAAAAGCGGCA TTTGCC 7260 
A G V G Y Q w 

• • • 

7261 GCTTTTT TTATGGQTGGCTATTATGTATCGT 7291 
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150 174 608 847 1291 1 476 1914/1915 235 3 

ha2 vr-izzzzzzzm V///////M V///////M • I 

(96/86) (77/66) (67/54) (79/68) (89/84) 

658/659 1098 
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HA2 1 MNKIFNVIWKVMTQTWVVVSELTRTHTKRLRNR . GDPVLATLLFATVQA . 48 

Ml || I II I I 1:111 IIIIIMMIII I -- ' s I I I I I I 111:1 
HA1 1 MNKIFNVIWNVVTQTWVWSELTRTHTKCASATVAVAVLATLLSATVEAN 50 

. 

49 NATDEDEELDPWRTAPVLSFHSDKEGTGEKEVTENSNWGIYFDNKG. . . 95 

|.|. ..: :-l «.tll:l I. • 

51 NNTPVTNKLKAY. . GEANFNFTNNSIADAEKQVQEAYKGLLNLNEKNASD 98 

96 . . .VLKAGAITL KAGDNLKXKQXTD 117 

I ... I I : . I : : I • - 

99 KLLVEDOTAATVGNLRKLGWVLSSKNGTRNEKSCXIVKHADEVLFEGKGGV 148 

« 

118 EXTNAS SFTYSLKKDLTDLTSVATEKLSFGANGD KVDI 155 

• |.. | .si:. I III. I M:: 

149 QVTSTSENGKHTITFAIAKDLGVKTATVSDTLTIGGGAAAGATTTPKVNV 



156 TSDANGLKLAK TGNGNVHLNGLDSTLPDAVTNTGVLSSSSFTPND 

||..:III:M .l:..lllll::Hl.l.t « 

199 TSTTDGLKPAKDAAGANGDTTVHLNGIGSTLTDTLVGSPATHIDG . GDQS 

201 VEKTRAATVKDVLNAGWNIKGAKTAG . .GNVESVDLVSAYNNVEFITGDK 

. . | | | I . : I I I I I I I I I I I I . I . : : I • 1-11*1 • ' : ♦ ' ' ' 5 • s 

248 THYTRAAS IKDVLNAGWNIKGVKAGSTTGQSENVDFVHTYDTVEFLSADT 

249 ntldvvltakenxkttevkftpktsvikekix;klftgkenndtnkvtsnt 

:| .|.:..||| I I I llll I I II M I.I.H M I-: 

298 ETTTVTVDSKENGKRTEVKIGAKTSVTKEKDGKLFTGKANKETNKVDGAN 

299 ATDNTDEGNGLVTAKAVTDAVNKAGWRVKTTTANGQNGDFATVASGTNVT 

II::. III. Illlll-IIMIII. 111:111. I I I I I I I I I I I I I I I I I I 
348 ATEDADEGKGLVTAKDV1DAVNKTGWRIKTTDAHGQNGDFATVASGTNVT 



349 FESGDGTTASVTKDTNGNGITVKYDAKVGDGLKFDSDKKrVADTTALT^ 398 
I.I |:||||.ll.: I :| I II III Mllll I 1:1:1 11.11 I Ml II. 

398 FASGNGTTATVTNGT. . DGITVKYDAKVGDGLKLDGD . KIAADTTALTVN 444 

399 G . GKVAE I AKEDDKKKL VNAGDLVT ALGNL SWKAKAEADTDG A 440 

: | | | I : : I . . I : I I I I . I : I I I I I . . I I I • • • I * ' • 

445 DGKNANNPKGKVADVASTDE . KKLVTAKGLVTALNSLSWTTTAAEADGGT 493 



441 LEG I SKDQETVKAGETVTFKAGKNLKVKQDGANFTY SLQDALTGLTS ITLG 

1:1 .VVlMIII:.IIMMIIIMM:IMI MM Mill I 

494 LDGNASEQEVKAGDKVTFKAGKNLKVKQEGANFTYSLQDALTGLTSITLG 

491 GTTNGGNDAKTVINKDGLTI TPAGNGGTTGTKTI SVTKDGIKAGNKAITN 
|.|:||| M I I I I 11 I I I . . : I IIIIIIIMI.M...: ' 



198 
200 
247 
248 
297 
298 
347 
348 
397 



490 
543 
540 
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544 T. . . GNNGAKTEINKIX5LTITPANGAG^ 590 

• • • • 

541 VASGLRAYDDANFDVI*NNSATDLNW^ - PLVT 589 

| . | | | : : : I I I I I • I • • I I • = I . : : : I I I I I I I 1 r I I . . r I I I : I * 
591 VVSGIiKKFGDANFDPLTSSADNLTKQNDDAYKGLTNIjDEKGTDKQTPVVA 640 

590 DSTAATVGDLRKLGWWS 607 

I.I I I II MM Ml 1:1 
641 DNTAATVGDLRGLGWVT S 658 
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Restriction maps of phage 11-17 and plasmid pT7-7 subclones 
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1 


ATGAACAAAA 


TTTTTAACGT TATTTGGAAT GTTGTGACTC AAACTTGGGT 


51 


TGTCGTATCT 


GAACTCACTC GCACCCACAC CAAATGCGCC 


TCCGCCACCG 


101 


TGGCAGTTGC 


CGTATTGGCA ACCCTGTTGT CCGCAACGGT 


TCAGGCGAAT 


151 


GCTACCGATG 


AAAACGAAGA TGATGAAGAA GAGTTAGAAC 


CCGTACAACG 


201 


CTCTGTTTTA AGGTGGAGCT TCAAATCCGC TAAGGAAGGC 


ACTGGAGAAC 


251 


AAGAGGGAAC 


A.&n&a&rca'PA ATAAATTTGA ACACAGATTC 


ATCAGGAAAT 


301 


GCAGTAGGAA 


GCAGCACAAT CACCTTCAAA GCCGGCGACA 


ACCTGAAAAT 


351 


CAAACAAAGC 


GGCAATGACT TCACCTACTC GCTGAAAAAA 


GAGCTGAAAA 


401 


ACCTGACCAG 


TGTTGAAACT GAAAAATTAT CGTTTGGCGC 


AAACGGCAAT 


451 


AAAGTTGATA 


TTACCAGTGA TGCAAATGGC TTGAAATTGG 


CGAAAACAGG 


501 


TAACGGAAAT 


GGTCAAAACA GTAATGTTCA CTTAAACGGT 


ATTGCTTCGA 


551 


CTTTGACCGA 


TACGCTTCCC GGTGGCACAA CAGGACACGT 


TGACACCAAC 


601 


ATTGATGCGG 


TTAATTATCA TCGCGCTGCA AGCGTACAAG 


ATGTGTTAAA 


651 


CAGCGGTTGG 


AATATCCAAG GCAATGGAAA CAATGTCGAT 


TTTGTCCGTA 


701 


CTTACGACAC 


CGTGGACTTT GTCAATGGCG CGAATGCCAA 


TGTGAGCGTT 


751 


ACGGCTGATA 


CGGCTCACAA AAAGACAACT GTCCGTGTGG 


ATGTAACAGG 


801 


CTTGCCGGTT 


CAATATGTTA CGGAAGACGG CAAAACCGTT 


GTGAAAGTGG 


851 


GCAATGAGTA 


TTACAAAGCC AAAGATGACG GTTCGGCGGA TATGAATCAA 


901 


AAAGTCGAAA 


ACGGCGAGCT GGCGAAAACC AAAGTGAAAT 


TGGTATCGGC 


951 


AAGCGGTACA 


AATCCGGTGA AAATTAGCAA TGTTGCAGAC 


GGCACGGAAG 
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1001 


ACACCGATGC 


GGTCAGCTTT 


AAGCAATTAA 


AAGCCTTGGA 


AGACAAACAG 


1051 


GTTACGTTGA 


GCACGAGCAA 


TGCTTATGCC 


AATGGCGGTA 


CAGATAACGA 


1101 


CGGCGGCAAG 


GCAACTCAAA 


CTTTAAGCAA 


TGGTTTGAAT 


TTTAAATTTA 


1151 


AATCTAGCGA 


TGGCGAGTTG 


TTGAAAATTA 


GCGCGACCGG 


CGATACGGTT 


1201 


ACTTTTACGC 


CGAAAAAAGG 


TTCGGTACAG 


GTTGGCGATG 


ATGGCAAGGC 


1251 


TTCAATTTCA 


AAAGGTGCAA 


ATACAACTGA 


AGGTTTGGTT 


GAGGCTTCTG 


1301 


AATTGGTTGA 


AAGCCTGAAC 


AAACTGGGTT 


GGAAAGTAGG 


GGTTGAGAAA 


1351 


GTCGGCAGCG 


GCGAGCTTGA 


TGGTACATCC 


AAGGAAACTT 


TAGTGAAGTC 
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GGGCGATAAA 


GTAACTTTGA 


AAGCCGGCGA 


CAATCTGAAG 


GTCAAACAAG 
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AGGGCACAAA 


CTTCACTTAC 


GCGCTCAAAG 


ATGAATTGAC 


GGGCGTGAAG 
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AGCGTGGAGT 


TTAAAGACAC 


GGCGAATGGT 


GCAAACGGTG 


CAAGCACGAA 


1551 


GATTACCAAA 


GACGGCTTGA 


CCATTACGCT 


GGCAAACGGT 


GCGAATGGTG 


1601 


CGACGGTGAC 


TGATGCCGAC 


AAGATTAAAG 


TTGCTTCGGA 


CGGCATTAGC 


1651 


GCGGGTAATA 


AAGCAGTTAA 


AAACGTCGCG 


GCAGGCGAAA 


TTTCTGCCAC 


X / ux 




ULUAl lAALu 


GAAGCCAGTT 


GTATGCCGTG 


GCAAAAGG/GG 


1751 


TAACAAACCT 


TGCTGGACAA 


GTGAATAATC 


TTGAGGGCAA 


AGTGAATAAA 


1801 


GTGGGCAAAC 


GTGCAGATGC 


AGGTACTGCA 


AGTGCATTAG 


CGGCTTCACA 


1851 


GTTACCACAA 


GCCACTATGC 


CAGGTAAATC 


AATGGTTTCT 


ATTGCGGGAA 


1901 


GTAGTTATCA 


AGGTCAAAAT 


GGTTTAGCTA 


TCGGGGTATC 


AAGAATTTCC 


1951 


GATAATGGCA 


AAGTGATTAT 


TCGCTTGTCT 


GGCACAACCA 


ATAGTCAAGG 


2001 


TAAAACAGGC 


GTTGCAGCAG 


GTGTTGGTTA CCAGTGG 
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1 


MNKIFNVTWN WTQTWVWS 


ELTRTHTKCA 


SATVAVAVIaA 


TLLSATVQAN 


51 


ATDENEDDEE ELEPVQRSVL 


RWSFKSAKEG 


TGBQEGTTEV 


INLNTDSSGN 


101 


AVGSSTITFK AGDNLKIKQS 


GNDFTYSLKK 


ELKNLTSVET 


EKLSFGANGN 


151 


KVDITSDANG LKLAKTGNGN 


GQNSNVHLNG 


IASTLTDTLA 


GGTTGHVDTN 


201 


IDAVNYHRAA SVQDVLNSGW 


NIQGNGNNVD 


FVRTYDTVDF 


VNGANANVSV 


251 


TADTAHKKTT VRVDVTGLPV 


QYVTEDGKTV 


VKVGNEYYKA 


KDDGSADMNQ 


301 


KVENGELAKT KVKIiVSASGT 


NPVKISNVAD 


GTEDTDAVSF 


KQLKALQDKQ 


351 


VTLSTSNAYA NGGTDNDGGK 


ATQTLSNGLN 


FKFKSSDGEL 


LKISATGDTV 


401 


TFTPKKGSVQ VGDDGKASIS 


KGANTTEGLV 


EASELVESLN 


KLGWKVGVEK 


451 


VGSGELDGTS KETLVKSGDK 


VTLKAGDNLK 


VKQEGTNFTY 


ALKDELTGVK 


501 


SVEFKDTANG ANGASTKITK 


DGLTITLANG 


ANGATVTDAD 


KIKVASDGIS 


551 


AGNKAVKNVA AGE I SATS TD 


AINGSQLYAV 


AKGVTNLAGQ 


VNNLEGKVNK 


601 


VGKRADAGTA SALAASQLPQ 


ATMPGKSMVS 


IAGSSYQGQN 


GLAIGVSRIS 


651 


DNGKVTIRLS GTTNSQGKTG 


VAAGVGYQW 







FIG..15 

[ 



SUBSTITUTE SHEET (RULE 26) 



WO 96/30519 



PCT/US96/040J1 



25/26 



1 MNKIFNVTWNVVTQTW\nnrSELTRlOT 50 

M I I IN I I II I I I II I I II llll lll M ill I II llll I 111 I I 1 1:1 I 
1 MNKIFNVIWNVVTQTWVVVSELTRTHTKCASATVAVAVIA 50 

• • • • • 

51 NNTPVTNKIiKAY GDANFNFTNNS IADAEKQVQEAYKGLLNLNEKNASDKL 100 

. . : . •|.|.:::,. 1*1. : 

51 ATDEKEDDBEELEPVQRSVLRWSFKSAKEG. 80 

• • . . . 
101 LVEDNTAATVGNLRKLGWVI*SSKNGTRNEK^ 150 

. I : : III • • : • I : I .::.:. : 

81 TGEQBGTTEVINL NTDSSGNAVGSSTITFKAGDNLKI 117 

• . . • * 
151 TSTSENGKHTITPALAKDIiGVKTATVSDTLTIGGGAAAGATTTPKVNVTS 200 

. lit. I 1:1 I.. .:.l.:l:.:. Il::ll 

118 KQSGND .... FTYSLKKELKNLTSVETEKLSPGANGN KVDITS 156 

. ♦ • • • 

201 TTIXyLRFAKDAAGANGDTTVHUVGIGSTLTDTLVGSPATHIIX^DQSTOT 250 

. . : | I I : I I . : . I . . . : . . I I I I I I : I I I I II I . I : . . . I : I . . : . . : I 
157 DANGLKIiAKTGNGNGQNSNVHLNGIASTLTDTLAGGTTGHVDTNXDAVNY 206 

• • . • • 
251 TRAASIKDVLNAGWNIKGVKAGSTTGQSENV 300 

I I I I : . I I II . I I I I . | . : : | | I I I : I I I I I : I : • : 

207 HRAASVQDVLNSGWNIQ GNGNNVDFVRTYDTVDFVNGANANV 248 

• • • . • 
301 TVTVDSKENGKRTEVKIGAKTSVIKEKIXnCLFTGKANK^ 350 

.11.1 I.:.: . . ..Ill .. :l. . I::. I. 

249 SVTADTAHKKTTVRVDVTGL P VQ YVTEDGKTWKVGNE YYKAKDDG S ADM 298 

... . • • 
351 DADEGKGLVTAKDVIDAVNKTGWRIKTTDANGQNGDFATVA SG 393 

: . . : . I : . . . I : I ..:..:..: I * . I * : 

299 NQKVENGELAKTKVKLVS ASGTNPVKI SNVADGTEDTDAVSFKQLKALQD 348 

• • . * * 

394 TNVTFASGNGTTATVTNG TDGITVKYDAKVGDGLKLDGDKI 434 

. . I | ; . . : I : I : . . : I : . . I : I : II:.: 

349 KQVTLSTSNAYANGGTDNDGGKATQTLSNGIiNFKFKSSDGELLKISA. . . 395 

• • • • • 
435 AADTTALTVNDG. .IQIANNPKGKVADVASTDEKKLVTAKGLVTALNSLSW 482 

.:||. .:!...! . : : : . I : . : . . . I . I . I I I . I . : I I . . I I • I : I 
396 TGDTVTFTPKKGSVQVGDDGKAS I SKGANTTE . GLVEASELVESLNKLGW 444 

483 TTTAAEADGGTLIXSIOISEQEVKAGDKVTF 532 

::|.|||...l I Kl II I 1:11 I.I I II I II I. II I I.I.I 

445 KVGVEKVGSGELIXSTSKETLVKSGDKVTLKAGDNLKVKQEGTNFTYALro 494 

533 ALTGLTSITL . . . GTGNNGAKTEINKDGLTIT ... PANGAGANNANTISV 576 

.Ml:. I:.: : . I . I I I . I . I . I I I I I I I . I I I I . . . : I : . I . I 
495 ELTGVKSVEFKOTANGANGASTKITKIX3LTITIANGANGATVTDADKIKV 544 

577 TiaXSISAGGQSVKNWSGLKKFGDANFDPLTSSADNLTKQNDDAYKGLTN 626 
..llllll. ..II 

545 ASDGISAGNKAVK 557 



FIG..16A 

SUBSTITUTE SHEET (RULE 26) 



WO 96/30519 



PCT7US96AM03 1 



26/26 



977 KV*N(roiSATSTDAIN6SQLYAVAK6VTNIAGQVNNI^(^VllKVGKRADA 1026 
MM s | | M I I I I I I I I I I I M I I I I I I I I I I I I I I I I I H M I I I I I I ■ 
NVAAGEISATSTDAINGSQLYAVAKGVTNIAGQVNNLEGKVNKVGKRADA 607 



558 
1027 



QTASALAASQLPQATMPGKSMVAIAGSSYQGQNGLAIGVSRISDNGKVII 1076 
IIMIIIIMIIIIIIIIIMI.IIIIIIIIIIIMIIIIIIIIIMIII 
608 GTAS ALAASQLPQATMPGKSMVS IAGS S YQGQNGLAIGVSRI SDNGKVT I 657 

1077 RLSGTTNSQGKTGVAAGVQYQW 1098 

I I I I I I I I I I I I I 1 I I I I I ■■ ■ 
658 RLSGTTNSQGKTGVAAGVGYQW 679 

FIG..16B 



SUBSTITUTE SHEET (RULE 26) 



INTERNATIONAL SEARCH REPORT 



t aKnu ay * Apptica&on No 

PCT/lo 96/94031 



A- CLASSIFICATION OF SUBJECT MATTER . , _ „ - , . _ , r n1 

IPC 6 C12N15/31 C97K14/285 A61K39/192 C07K16/12 //(C12N15/31. 
C12R1:21) 

Actortmt to bmnuacnil P«ttm CUsgficmop (IPC) or to bmh inaonil dmfioaon md IPC 



8. FIELDS SEARCHED 



Miouran documentation wanfacd (damGabon system foUowcd by dunficuon symbolf) 

IPC 6 CB7K 



Documentation searched other than nnnunum documentation to the extent that such docuroerm are lochidcd in the fields searched 



Electronic base nmniltrrt during the imemt&onal search (name of data base and, where practical, search terms used) 



C. DOCUMENTS CONSIDERED TO BE RELEVANT 



Category' 



Citation of document, with indication, where appropriate, of the rdtvant passages 



Relevant to claim No. 



WO, A, 92 19936 (MICROCARB INC) 9 July 1992 

see claims 19-15,25-36 

WO, A, 94 09149 (MICROCARB INC ;KRIVAN 
HOWARD C (US); SAMUELS JAMES E (US); 
NORBERG) 6 January 1994 
see claims 5,7-23 



1,6, 

13-16,19 



1-6, 

13-16,19 



HI 



Further documents are listed m the c 



lofboxC. 



0 



Patent family 



* Special categories of a ted doc mm no : 

"A" doc u ment defining the general state of the an which is not 

co nsi der ed to be of particular relevance 
*E' earlier document but published on or after the mtenunonal 

filing, dale 

"L" document which may throw doubts on pnonty datmts) or 
which » a ted to establish Che publication date of another 
ata&on or other special reason (as specified) 

*0* document referring to an oral disclosure, use. exhibition or 
other means 

'P* document pubbshed pnor to the international films date but 
later than the priority dale rl timed 



T* later d ^"^ puhtidwrt after the mfcJua>onal fihng date 
or pnonty date and not m conflict with the accbca&on but 
cited to understand the principle or theory underlying the 



*X* document of particular relevance; the claimed invention 
cannot be considered novel or cannot be consdered to 
involve an inventive step when the document is taken alone 

*Y" document of particular relevance; the claimed invention 
cannot be considered to involve an inventive step when the 
d o cument is combined with one or more other such docu- 
ments, such combination being obvious to a person dolled 
m the an 

*eV document rncmber of the same patent family 



Date of the actual completion of the international search 



19 August 1996 



Date of muling of the international search report 



0 3. 09. 96 



of the ISA 

European Patent Office, P.B. Silt Patendaan 2 
NL • 2280 HV Rijswijk 
Tel. t> 31-70) 340-2040* Tx. 31 651 epo nU 
Far t> 31-70) 340-3016 



Authorized officer 



Gurdjian, D 



ircTtSA/atti 



*•*!) (My IH3) 



page 1 of 2 



INFECT. IMMUN., 1992, 
V o °ages 60 13Q5-i3 4 13. XP099578343 

pertussis" 

see the whole document^ 
INFECTION AND IMMUNITY 62 (8). 1994. 

assss 'sf 5 *!- i " f '» miM are 

part of gene clusters 
see the whole document 

■>\ 1994 178A.. XP808579256 

Haemophilus influenzae (NTHI) 
see abstract 

F LEISCHMANN R D ET AL: 1*0 ^f 0 ™ 
random sequencing and assembly of 
Haemophilus influenzae Rd 
see example ADHESIN 

Haemophilus influenzae 
see the whole document 

BRINTON) 1 February 1996 
see claims 10-12 



INTERNATIONAL SEARCH REPORT 



Intentional application No. 
PCT/US96/ 04031 



Box! Observations where certain claims were found unsearchable (Continuation of item I of Cm the*) 
This taternatfona. se*r* report has not b«n esubBsbed in respect of omain claim, under Arid. I7 PX .) for the foUowing re-onr 

E S^U^yreta to rubjeci m^icr not squired to be searched by this Authority, namely: 

Remark: Although claims 20 - 21 are directed to a method of treatment 

of the human/animal body the search has been carried out and based on the 
alleged effects of the compound/composition. 

* Q S relate to part, of the internal fiction that do not comply with the prescribe requirements to such 
aTeXl^at no meaomgful inirrnauonal search can be carried out. speofically: 



3. [ I Claims Nosj 

because they are 



dependent claims and art not drafted in accordance with the second and third sentences of Rule 6.4(a). 



Box II Observation* where unity of invention it lacki ng (Continumtion of item 2 of firat sheet) 
This International Searching Authority found multiple inventions in this international application, as follows: 



I. As all required additional search fees were timely paid by the applicanl. this international search report covers all 



searchable claims. 



□ 



2. \ I As all searchable claims 
of any additional fee. 



could bt searches without effort justifying an additional fee. this Authority did not invite payment 



3. I I At only some of the required actional search ^J~£ N 
1 1 covers only those claims for which fees were paad. specifically claims No*~ 



ornery paid by the applicant, this internaiional search report 



Remark m Pr»trst 



. accompanied by the applicants protest. 



| | The additional search fees were i 
| — | No protest accompanied the payment of iikliuonal search fees. 



Form PCT/lSA/210 (commuauon of first sheet (1» (July 1992) 



INTERNATIONAL SEARCH REPORT i hmniar AwUaB «,No 
to. j^womp^cwuyo—" | pcT/Oa 96/64931 





